How to cache INArray to HOST RAM?

SidneyLann · November 15, 2020, 11:58pm

List featureList = new ArrayList<>();
when add INDArray instances to featureList , it is cache in GPU RAM, how to cache them to HOST RAM?

agibsonccc · November 16, 2020, 1:33am

@SidneyLann everything is already bidirectionally allocated/cached automatically in an async manner. The only thing you can really do is control when the data is synced. If you need to make sure the data is synced, just call:

Nd4j.getExecutioner().commit();

SidneyLann · November 16, 2020, 1:50am

List featureList = new ArrayList<>();
INDArray output ＝ sameDiff.output(…);
featureList.add(output);

This output INArray is NOT released in GPU RAM and in a long time hit error: Allocation failed: [[DEVICE] allocation failed; Error code: [2]], how to release it in GPU RAM and only cache it in HOST RAM? Thanks.

agibsonccc · November 16, 2020, 1:59am

Ok, so what you actually want to do is release the memory from gpu. Could you elaborate on your end to end use case a bit?

Also, what’s the actual root problem? You’re running out of gpu memory during training?
Let’s try to discuss the actual problem you’re having rather than what you think a good workaround might be. We might be able to come up with a few different things you can try.

SidneyLann · November 16, 2020, 2:18am

My work steps:

Use BERT model by SameDiff to get output(the minibatch size can only be 1)
Concat 32 outputs in step 1 to train my model

So I need to cache the outputs in step 1, but I want to cache it only in HOST RAM but not GPU RAM. How to do? Thanks.

SidneyLann · November 16, 2020, 3:11am

If comment this line featureList.add(output);, the output will not cache in HOST RAM and GPU RAM and then will NOT hit error: Allocation failed: [[DEVICE] allocation failed; Error code: [2]].

agibsonccc · November 16, 2020, 3:16am

@SidneyLann if all you want to do is save memory, then I would suggest calling System.gc() each epoch if you are pushing the limits of the gpu.
That will mostly ensure GC happens fast.
Beyond that, I would look in to running things within a configured workspace if you want to do that. Look at our docs for a bit more about the memory management: https://deeplearning4j.konduit.ai/config/config-memory

The bottom section should be relevant for you.

SidneyLann · November 16, 2020, 6:34am

basicConfig = WorkspaceConfiguration.builder()
.policyAllocation(AllocationPolicy.STRICT)
.policyLearning(LearningPolicy.FIRST_LOOP)
.policyMirroring(MirroringPolicy.HOST_ONLY)
.policySpill(SpillPolicy.EXTERNAL).build();

   ...

    outIdxsArr = Nd4j.createFromArray(outIdxs);
    outMaskArr = Nd4j.createFromArray(outMask);
    outSegmentIdArr = Nd4j.createFromArray(outSegment);

    placeholders.put("Placeholder", outIdxsArr);
    placeholders.put("Placeholder_2", outSegmentIdArr);
    placeholders.put("Placeholder_1", outMaskArr);
    output = output1(placeholders);
    featureTmp.add(output);
  }

  try (MemoryWorkspace ws = Nd4j.getWorkspaceManager().getAndActivateWorkspace(basicConfig, "OTHER_ID")) {
    features.add(Nd4j.vstack(featureTmp).detach());
    labels.add(Nd4j.createFromArray(outLabel).detach());
  } catch (Exception e) {
    e.printStackTrace();
  }

Are the INDArrays in features and labels ONLY allocated in CPU RAM? How to allocate INDArrays ONLY in jvm heap but not off heap?

I have 32g CPU RAM and 11g GPU RAM, such then off heap only used 11g, and jvm heap used < 3g here, I want to use more CPU RAM to cache INDArrays.

Topic		Replies	Views
GPU memory usage for BERT in SameDiff is extremely high and grows with size of triaining set SameDiff	4	1255	June 18, 2020
How to feed a network with data from GPU ND4J	3	518	March 9, 2021
GPU Issue, memory not freed DL4J	3	499	May 7, 2020
Are DataSetIterator.next() results stored during learning/training and inference beyond subsequent next() call? DL4J	5	401	January 20, 2022
Recommended way to create INDArray for prediction? DL4J	5	987	May 29, 2020

How to cache INArray to HOST RAM?

Related topics