How do I fix this possible memory leak?

Hi,

I tried below test code to see if I can clear my workspace memory. However what I see is it doesn’t clear the memory. Also using same workspace ID again after destroy result in more data usage.

@Test
public void testWithMemoryWorkSpacesDestroyAllForCurrentThread() throws InterruptedException {
    Nd4j.create(1);

    System.out.println("Physical bytes - before: " + physicalBytes());
    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
    }
    System.out.println("Physical bytes - after 1x: " + physicalBytes());
    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
    }
    System.out.println("Physical bytes - after 2x: " + physicalBytes());
    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
    }
    System.out.println("Physical bytes - after 3x: " + physicalBytes());
    Thread.sleep(1000);
    System.out.println("Physical bytes - before destroy: " + physicalBytes());
    Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();
    System.out.println("Physical bytes - after destroy: " + physicalBytes());
    Thread.sleep(1000);
    System.out.println("Physical bytes - after close + 1 sec: " + physicalBytes());

    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
    }
    System.out.println("Physical bytes - after 4x: " + physicalBytes());

    System.gc();
    System.out.println("Physical bytes - after gc1: " + physicalBytes());
    System.gc();
    System.out.println("Physical bytes - after gc2: " + physicalBytes());
}

private WorkspaceConfiguration memoryConfig() {
    return WorkspaceConfiguration.builder()
            .initialSize(5 * 1024 * 1024 * 1024L) // 5GB
            .policyAllocation(AllocationPolicy.STRICT)
            .policyLearning(LearningPolicy.FIRST_LOOP)
            .policySpill(SpillPolicy.FAIL)
            .maxSize(5 * 1024 * 1024 * 1024L)
            .build();
}

private String physicalBytes() {
    return Pointer.physicalBytes() / (1024 * 1024) + "MB";
}

Here’s the output

Physical bytes - before: 480MB
Physical bytes - after 1x: 5600MB
Physical bytes - after 2x: 5600MB
Physical bytes - after 3x: 5600MB
Physical bytes - before destroy: 5600MB
Physical bytes - after destroy: 5389MB
Physical bytes - after close + 1 sec: 5389MB
Physical bytes - after 4x: 10509MB
Physical bytes - after gc1: 10507MB
Physical bytes - after gc2: 10505MB

How do I clear/free/deallocate my workspace and all the INDArrays created inside it?

-Bhathiya

I’ll check what’s wrong with this method, but for sure you can get your workspace by ID, and call .destroy() on it meanwhile

I tried below code to get and destroy the workspace after using it… however I cannot use it after I destroy it. It seems Nd4j.getWorkspaceManager().getAndActivateWorkspace doesn’t create the workspace again. Also the error raised say Can't allocate memory: Workspace is full

@Test
public void testWithMemoryWorkSpacesWithDestroy() throws InterruptedException {
    Nd4j.create(1);

    System.out.println("Physical bytes - before: " + physicalBytes());
    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
    }
    System.out.println("Physical bytes - after 1x: " + physicalBytes());
    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
    }
    System.out.println("Physical bytes - after 2x: " + physicalBytes());
    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
        System.out.println("Physical bytes - after 3x: " + physicalBytes());
        ignored.destroyWorkspace();
        System.out.println("Physical bytes - after workspace.destroyWorkspace(): " + physicalBytes());
    }
    Thread.sleep(1000);

    System.out.println("Physical bytes - after close + 1 sec: " + physicalBytes());

    try (final MemoryWorkspace ignored = Nd4j.getWorkspaceManager().getAndActivateWorkspace(
            memoryConfig(), "TestWorkSpaceID")) {
        INDArray arr = Nd4j.create(DataType.FLOAT, 100_000_000L);    //4GB
    }
    System.out.println("Physical bytes - after 4x: " + physicalBytes());

    System.gc();
    System.out.println("Physical bytes - after gc1: " + physicalBytes());
    System.gc();
    System.out.println("Physical bytes - after gc2: " + physicalBytes());
}

This results in not being able to allocate memory on 4th try…

Physical bytes - before: 538MB
Physical bytes - after 1x: 5658MB
Physical bytes - after 2x: 5658MB
Physical bytes - after 3x: 5658MB
Physical bytes - after workspace.destroyWorkspace(): 538MB
Physical bytes - after close + 1 sec: 538MB

org.nd4j.linalg.exception.ND4JIllegalStateException: Can't allocate memory: Workspace is full

	at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.alloc(Nd4jWorkspace.java:442)
	at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.alloc(Nd4jWorkspace.java:322)
	at org.nd4j.linalg.api.buffer.BaseDataBuffer.<init>(BaseDataBuffer.java:818)
	at org.nd4j.linalg.api.buffer.FloatBuffer.<init>(FloatBuffer.java:58)
	at org.nd4j.linalg.api.buffer.factory.DefaultDataBufferFactory.create(DefaultDataBufferFactory.java:326)
	at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1455)
	at org.nd4j.linalg.api.ndarray.BaseNDArray.<init>(BaseNDArray.java:341)
	at org.nd4j.linalg.cpu.nativecpu.NDArray.<init>(NDArray.java:185)
	at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.create(CpuNDArrayFactory.java:231)
	at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4274)
	at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3962)
	at xxxxxx.testWithMemoryWorkSpacesWithDestroy(ClassifyFromMultipleThreadsTest.java:287)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)

Same result even if I use
Nd4j.getWorkspaceManager().getWorkspaceForCurrentThread("TestWorkSpaceID").destroyWorkspace();

Is this behaviour expected?

So, indeed I’ve found a bug, but it’s not reproducible with your code.

I’m not sure why it says “10GB”, but in my branch i’m able to run endless loop and allocate 5GB workspace in each iteration.

Here’s output:

As you can see - lots of iterations, 5GB per iteration, and still says - 10GB. Dunno why yet