I’m currently seeing performance issues that only arise in highly threaded environments. I’m using the linux avx512 library with version 1.0.0-beta7.
In an individual thread, I measured execution time using timers and the performance appears to be ok, but the overall CPU usage of the machine drastically increases when the network is running, and appears to do so exponentially with respect to workload (CPU is static without dl4j code snippet).
I’m using ParallelInference to run requests through a small model with workers set equal to the number of cores on the server.
Given the above, I suspect this to be related to increased GC overhead, or some execution time added to the beginning of each thread that somehow escapes my timers. I’m leaning towards a memory / workspace issue, but am unsure of how to debug.
Please let me know what other information would be helpful to help me debug this and thanks in advance!