Linux performance issues in highly threaded environment

Hi all,

I’m currently seeing performance issues that only arise in highly threaded environments. I’m using the linux avx512 library with version 1.0.0-beta7.

In an individual thread, I measured execution time using timers and the performance appears to be ok, but the overall CPU usage of the machine drastically increases when the network is running, and appears to do so exponentially with respect to workload (CPU is static without dl4j code snippet).

I’m using ParallelInference to run requests through a small model with workers set equal to the number of cores on the server.

Given the above, I suspect this to be related to increased GC overhead, or some execution time added to the beginning of each thread that somehow escapes my timers. I’m leaning towards a memory / workspace issue, but am unsure of how to debug.

Please let me know what other information would be helpful to help me debug this and thanks in advance!

@cgrabows could you let us know your numbers with the newest 1.0.0-M1.1? There are more classifiers to choose from as well. See release notes:

You can find available classifiers here:
https://repo1.maven.org/maven2/org/nd4j/nd4j-native/1.0.0-M1.1/

You can also find new docs here: