How to determine cause of memory leak while making predictions?

kullanici0606 · June 3, 2020, 8:39am

Hi,

We are using Deeplearning4j for making predictions where model is trained by Keras and imported to Deeplearning4j. Version is 1.0.0-beta6 [1]. We are using CPU backend, our CPU supports AVX2 and AVX512 instructions [2].

The code below is called by 40 threads each of which has its own CompuationGraph (no sharing). As we make predictions, off-heap memory usage keeps increasing and at some point OutOfMemoryError occurs:

Physical memory usage is too high: physicalBytes (341G) > maxPhysicalBytes (340G)

I couldn’t find the reason for this. How can I find what part of the code is leaking memory? Or what should I use to prevent memory leaks?

I added destroyAllWorkspacesForCurrentThread() when 80% threshold limit reached, however even though this code is called from each thread, memory keeps around 300G (see logging below code)

Program paramerers:

 -XX:+UseG1GC -Xms16g -Xmx100g -Dorg.bytedeco.javacpp.maxbytes=240G -Dorg.bytedeco.javacpp.maxphysicalbytes=250G

Running code:

private final WorkspaceConfiguration learningConfig = WorkspaceConfiguration.builder()
        .policyAllocation(AllocationPolicy.STRICT) // <-- this option disables overallocation behavior
        .policyLearning(LearningPolicy.FIRST_LOOP) // <-- this option makes workspace learning after first loop
        .build();

private void predict(ComputationGraph graph, float[] input1, float[] input2) {
    // called by 16 threas.
    long start = System.currentTimeMillis();
    try(MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(learningConfig, "WORKSPACE_ID")) {
        INDArray firstInput = Nd4j.create(input1, SHAPE);
        INDArray secondInput = Nd4j.create(input2, SHAPE);
        long startForPredictions = System.currentTimeMillis();
        INDArray result = graph.output(false, ws, firstInput, secondInput)[0];
        // process is almost equivalent to no op for testing
        process(result);

        long end = System.currentTimeMillis();
        logger.info("Time took {} ms, prediction took {}",  end - start, end - startForPredictions);
    }
}

Graph import:

ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(
                Paths.get(modelDirectory,  "model.json").toString(),
                Paths.get(modelDirectory, "model_weights.h5").toString());

Logging code:

logger.info("Physical bytes used by deeplearning4j: {} ({}), available bytes: {}", Pointer.physicalBytes(), Pointer.formatBytes(Pointer.physicalBytes()), Pointer.availablePhysicalBytes());

GC config:

Nd4j.getMemoryManager().setAutoGcWindow(10000);

[1] 1.0.0-beta7 gave an error during prections for imported model, therefore we delayed migration.
[2] We have declared nd4j-native avx2 and avx512 dependendencies in our pom.xml

raver119 · June 3, 2020, 8:42am

Just update to beta7, couple of leaks were fixed there.

kullanici0606 · June 3, 2020, 8:50am

I was actually plannning to update beta7 but the code that works for beta6 gives the following error when running with beta7. Should I update my model or change the input shapes? Is the a semantic change of input indexes?

Exception in thread "main" org.deeplearning4j.exception.DL4JInvalidInputException: Received input with size(1) = 30 (input array shape = [2, 30, 1]); input.size(1) must match layer nIn size (nIn = 1)
at org.deeplearning4j.nn.layers.recurrent.LSTMHelpers.activateHelper(LSTMHelpers.java:189)
at org.deeplearning4j.nn.layers.recurrent.LSTM.activateHelper(LSTM.java:177)
at org.deeplearning4j.nn.layers.recurrent.LSTM.activate(LSTM.java:147)
at org.deeplearning4j.nn.layers.recurrent.LastTimeStepLayer.activate(LastTimeStepLayer.java:101)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2380)
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1741)
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1670)
at org.example.ModelImport.testCase(ModelImport.java:172)
at org.example.ModelImport.main(ModelImport.java:230)

treo · June 3, 2020, 8:56am

As you are importing a model, you have likely run into the change, that imports now use the model’s channel order, instead of converting to dl4j’s default order.

You can use permute to change the order of your inputs to match the expected format.

kullanici0606 · June 3, 2020, 11:48am

Thank you, I changed the input shape from [ BATCH_SIZE, 1, INPUT_SIZE] to [BATCH_SIZE, INPUT_SIZE, 1] then it worked successfully. Now I will check whether beta7 solves my memory leak problem.

kullanici0606 · June 3, 2020, 2:05pm

Even after updating to beta7, we still see that memory increases a lot. Since I am using Memory workpaces with learning and strict policy config, shouldn’t the memory stay stable after a couple of predictions (output)? Should I also clear ComputationGraph from time to time? Or is there a way to check what is taking the memory?

arnaud22 · June 3, 2020, 2:29pm

Similar bug here, test in progress : off-heap memory don’t freed

raver119 · June 3, 2020, 2:54pm

Can you please show the graph that reproduces this problem?

kullanici0606 · June 4, 2020, 6:07am

My manager won’t let me show the graph we are using, but I will try to generate a graph that will reproduce problem. Meanwhile, is there any workaround for forcefully freeing off-heap memory? destroyAllWorkspacesForCurrentThread is delaying the problem, but eventually we still end up OutOfMemory error.

raver119 · June 4, 2020, 8:02am

We don’t need your exact model with your weights and data

All we need is compositionally equivalent graph, with random weights, random input size etc.

kullanici0606 · June 4, 2020, 9:11am

I created a model similar to ours, hoping it will replicate the issue. This model is slightly less complicated than ours but has same characteristics. You can find the model at [1]. I think forum does not let file uploads so I used Github gist but it also does not let me upload binary files. Therefore I base 64 encoded the weights file. Gist also have python code to generate model.

cat memory_leak_model_weights.h5 | base64 > base64_encoded_memory_leak_model_weights.h5

You can revert it with following command:

cat base64_encoded_memory_leak_model_weights.h5 txt | base64 -d > memory_leak_model_weights.h5memory_leak_model_weights.h5

Keras version: 2.1.1

[1] https://gist.github.com/kullanici0606/fb33a2bff52c676d472b79cfa725790f

kullanici0606 · June 5, 2020, 7:02am

When I look at the logs, it says that 250G physical bytes are used, however when I check output of “free -h” command, system reports that 105G is used, rest is buff/cache (but program rss size is really 250G).

We are using LMDB with lmdbjava [1] library. Is there a possibility that org.bytedeco.javacpp.Pointer incorrectly counts LMDB memory mapped file as allocated by itself and fails?

Is there a way to verify this? Or if this is the case, how can I bypass org.bytedeco.javacpp.Pointer memory check?

[1] GitHub - lmdbjava/lmdbjava: Lightning Memory Database (LMDB) for Java: a low latency, transactional, sorted, embedded, key-value store

saudet · June 5, 2020, 7:08am

Yes, Linux appears to count the cache for memory-mapped files towards memory used by the process:

github.com/eclipse/deeplearning4j

PCA calculations not using entire MMAP space

opened 12:24PM - 22 Apr 20 UTC

lukaszbachman

Performance Blas/Lapack

I'm trying to run a simple experiment with ND4J where PCA calculations will be o…ffloaded to an MMAP. In production I plan to use this on large matrices (ex: 30_000 columns, 20_000 rows). For testing I'm using something much smaller to verify if I'm using the API in a proper way. I have read relevant sections related to `INDArrays` and memory usage on DeepLearning4j.org and nd4j.org, but I'm still not getting something. I have attempted several tests before wrt. the MMAP usage and I noticed that I don't really need to set large `-Xmx` and `-Dorg.bytedeco.javacpp.maxbytes` values. I am using respectively 50mb and 70mb and that allowed me to successfully create MMAPs as large as 8GBs filled with `Nd4j.rand(...)` from the very first byte up to the very end of the file. I also noticed that if I attempt to create a bigger array that won't fit into the MMAP, I may end up with JVM Crash (see #8864) or the MMAP won't be used at all (thanks to @raver119 for giving me a hint about spilled arrays). However what I'm looking at right now looks a bit odd. First, an example to reproduce: ``` public static void main(String[] args) { WorkspaceConfiguration mmap = WorkspaceConfiguration.builder() .initialSize(100 * 1024L * 1024L) // 100mbs .tempFilePath("/tmp/my.mmap") .policyLocation(LocationPolicy.MMAP) .build(); try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(mmap, "M2")) { int samplesCount = 1000; int featuresCount = 500; INDArray matrix = Nd4j.rand(DataType.FLOAT, samplesCount, featuresCount); INDArray dup = matrix.dup(); INDArray factor = PCA.pca_factor(matrix, 0.95, false); // EXCEPTION THROWN HERE INDArray reducedMatrix = dup.mmul(factor); double[][] reduced = reducedMatrix.toDoubleMatrix(); System.out.println("Reduced = " + reduced.length + " x " + reduced[0].length); } } ``` Running this code I ended up with this exception: ``` Exception in thread "main" java.lang.OutOfMemoryError: Cannot allocate new LongPointer(1): totalBytes = 264, physicalBytes = 125M ... Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physicalBytes (125M) > maxPhysicalBytes (120M) ... ``` I jumped to hex editor and checked where is the last non-zero byte stored, and it was close to offset `004c5ae0`. This means that around 1_251_000 floats was stored. But it also means that the file had also room for 24_963_403 more (if I'm doing my math right: `(0x6400000−0x004C5AD4) / 4`). So it looks to me that the MMAP had plenty of room left (96%) but for some reason I ran into OOME. I'm running this with : ``` -Xmx50m -Dorg.bytedeco.javacpp.maxbytes=70m ``` I know that likely changing memory settings could bail me out here, but I need to understand what the problem is in order to avoid similar issues in commercial deployments. #### Version Information * ND4J 1.0.0-beta6, CPU backend * Ubuntu 19.04 #### Additional Information https://gist.github.com/lukaszbachman/57bb2b17b6f4e8b7e00ce5b8935c30a2 ![mmap-end](https://user-images.githubusercontent.com/676753/79981395-e635a600-84a4-11ea-999c-b883c152ec08.png)

The check can be disabled, of course: https://deeplearning4j.konduit.ai/config/config-memory

kullanici0606 · June 5, 2020, 7:16am

Thank you, as you point out, I will now try with -Dorg.bytedeco.javacpp.maxphysicalbytes=0 to disable checks and see if everything works fine.

kullanici0606 · June 5, 2020, 12:25pm

Disabling check seems to solve our problem. Thank you all who helped me in this thread.

Topic		Replies	Views
Dl4j cuda 11.2 running out of memory on evaluation on ubuntu 20.04 DL4J	25	1750	November 6, 2021
Debugging Memory Issues in Java Application DL4J	7	775	March 5, 2020
Std::bad_alloc error DL4J	8	542	January 3, 2022
OutOfMemoryError ND4J	10	592	May 4, 2020
Memory leak showing up excessively on Linux backend ND4J	3	726	October 11, 2021

How to determine cause of memory leak while making predictions?

Related topics