How to determine cause of memory leak while making predictions?


We are using Deeplearning4j for making predictions where model is trained by Keras and imported to Deeplearning4j. Version is 1.0.0-beta6 [1]. We are using CPU backend, our CPU supports AVX2 and AVX512 instructions [2].

The code below is called by 40 threads each of which has its own CompuationGraph (no sharing). As we make predictions, off-heap memory usage keeps increasing and at some point OutOfMemoryError occurs:

Physical memory usage is too high: physicalBytes (341G) > maxPhysicalBytes (340G)

I couldn’t find the reason for this. How can I find what part of the code is leaking memory? Or what should I use to prevent memory leaks?

I added destroyAllWorkspacesForCurrentThread() when 80% threshold limit reached, however even though this code is called from each thread, memory keeps around 300G (see logging below code)

Program paramerers:

 -XX:+UseG1GC -Xms16g -Xmx100g -Dorg.bytedeco.javacpp.maxbytes=240G -Dorg.bytedeco.javacpp.maxphysicalbytes=250G

Running code:

private final WorkspaceConfiguration learningConfig = WorkspaceConfiguration.builder()
        .policyAllocation(AllocationPolicy.STRICT) // <-- this option disables overallocation behavior
        .policyLearning(LearningPolicy.FIRST_LOOP) // <-- this option makes workspace learning after first loop

private void predict(ComputationGraph graph, float[] input1, float[] input2) {
    // called by 16 threas.
    long start = System.currentTimeMillis();
    try(MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(learningConfig, "WORKSPACE_ID")) {
        INDArray firstInput = Nd4j.create(input1, SHAPE);
        INDArray secondInput = Nd4j.create(input2, SHAPE);
        long startForPredictions = System.currentTimeMillis();
        INDArray result = graph.output(false, ws, firstInput, secondInput)[0];
        // process is almost equivalent to no op for testing

        long end = System.currentTimeMillis();"Time took {} ms, prediction took {}",  end - start, end - startForPredictions);

Graph import:

ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(
                Paths.get(modelDirectory,  "model.json").toString(),
                Paths.get(modelDirectory, "model_weights.h5").toString());

Logging code:"Physical bytes used by deeplearning4j: {} ({}), available bytes: {}", Pointer.physicalBytes(), Pointer.formatBytes(Pointer.physicalBytes()), Pointer.availablePhysicalBytes());

GC config:


[1] 1.0.0-beta7 gave an error during prections for imported model, therefore we delayed migration.
[2] We have declared nd4j-native avx2 and avx512 dependendencies in our pom.xml

Just update to beta7, couple of leaks were fixed there.

I was actually plannning to update beta7 but the code that works for beta6 gives the following error when running with beta7. Should I update my model or change the input shapes? Is the a semantic change of input indexes?

Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Received input with size(1) = 30 (input array shape = [2, 30, 1]); input.size(1) must match layer nIn size (nIn = 1)
at org.deeplearning4j.nn.layers.recurrent.LSTMHelpers.activateHelper(
at org.deeplearning4j.nn.layers.recurrent.LSTM.activateHelper(
at org.deeplearning4j.nn.layers.recurrent.LSTM.activate(
at org.deeplearning4j.nn.layers.recurrent.LastTimeStepLayer.activate(
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(
at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(
at org.deeplearning4j.nn.graph.ComputationGraph.output(
at org.deeplearning4j.nn.graph.ComputationGraph.output(
at org.example.ModelImport.testCase(
at org.example.ModelImport.main(

As you are importing a model, you have likely run into the change, that imports now use the model’s channel order, instead of converting to dl4j’s default order.

You can use permute to change the order of your inputs to match the expected format.

Thank you, I changed the input shape from [ BATCH_SIZE, 1, INPUT_SIZE] to [BATCH_SIZE, INPUT_SIZE, 1] then it worked successfully. Now I will check whether beta7 solves my memory leak problem.

Even after updating to beta7, we still see that memory increases a lot. Since I am using Memory workpaces with learning and strict policy config, shouldn’t the memory stay stable after a couple of predictions (output)? Should I also clear ComputationGraph from time to time? Or is there a way to check what is taking the memory?

Similar bug here, test in progress : off-heap memory don’t freed

Can you please show the graph that reproduces this problem?

My manager won’t let me show the graph we are using, but I will try to generate a graph that will reproduce problem. Meanwhile, is there any workaround for forcefully freeing off-heap memory? destroyAllWorkspacesForCurrentThread is delaying the problem, but eventually we still end up OutOfMemory error.

We don’t need your exact model with your weights and data :slight_smile:

All we need is compositionally equivalent graph, with random weights, random input size etc.

I created a model similar to ours, hoping it will replicate the issue. This model is slightly less complicated than ours but has same characteristics. You can find the model at [1]. I think forum does not let file uploads so I used Github gist but it also does not let me upload binary files. Therefore I base 64 encoded the weights file. Gist also have python code to generate model.

cat memory_leak_model_weights.h5 | base64 > base64_encoded_memory_leak_model_weights.h5 

You can revert it with following command:

cat base64_encoded_memory_leak_model_weights.h5 txt | base64 -d > memory_leak_model_weights.h5memory_leak_model_weights.h5

Keras version: 2.1.1


When I look at the logs, it says that 250G physical bytes are used, however when I check output of “free -h” command, system reports that 105G is used, rest is buff/cache (but program rss size is really 250G).

We are using LMDB with lmdbjava [1] library. Is there a possibility that org.bytedeco.javacpp.Pointer incorrectly counts LMDB memory mapped file as allocated by itself and fails?

Is there a way to verify this? Or if this is the case, how can I bypass org.bytedeco.javacpp.Pointer memory check?


Yes, Linux appears to count the cache for memory-mapped files towards memory used by the process:

The check can be disabled, of course:

Thank you, as you point out, I will now try with -Dorg.bytedeco.javacpp.maxphysicalbytes=0 to disable checks and see if everything works fine.

Disabling check seems to solve our problem. Thank you all who helped me in this thread.