Is longer lasting allocation on host memory possible/intended use case with the workspace concept?

Hello,

I created an iterator and within it’s constructor, I want to hold the data it’s going to use in the .next(num) method. Hence, the idea was to allocate everything to host memory in one workspace.
I made it work tagging it for out of scope use & toggeling the workspace use off; this works for now. However, it spits some warnings (cause it’s turned off).

Here’s some snippet running in the constructor:

        WorkspaceConfiguration hostBufferConfiguration = WorkspaceConfiguration.builder()
                .initialSize(size)
                .policyAllocation(AllocationPolicy.STRICT)
                .policySpill(SpillPolicy.FAIL)
                .policyLearning(LearningPolicy.NONE)
                .policyLocation(LocationPolicy.RAM)
                .policyMirroring(MirroringPolicy.HOST_ONLY)
                .build();

        this.workspaceBuffer = Nd4j.getWorkspaceManager().createNewWorkspace(hostBufferConfiguration, workspaceName);
        this.workspaceBuffer.enableDebug(debug);

        this.workspaceBuffer.notifyScopeEntered();
        this.data = data.migrate();
        this.mask = mask.migrate();
        this.workspaceBuffer.tagOutOfScopeUse();
        this.workspaceBuffer.toggleWorkspaceUse(false); //never write to this ws again (just as read only buffer)

Whereas the next() method just accesses the INDArrays at this.data & this.mask.

Is that a valid approach or am I abusing the concept?

What you’re trying to do is to force an allocation to be host only, so you can preload it into RAM and save the loading time while training on a GPU?

I thought that I’ve seen a better solution, but honestly, I can’t find that anymore.

@raver119 is there any better way to do that?

Thanks for the reply. Indeed, that’s what I try to accomplish. As I naturally have more host RAM than GPU. My hypothesis however is, that once it is in RAM, the transfer of chunks of it happens fairly fast.

What I found so far is that above solution leaks into the application’s memory over time, when I don’t clear all workspaces with Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();

Doing that still leaves me with some memory off-heap (on the GPU actually) which I can’t explain. It starts with 871 MB allocation on the GPU as soon as I invoke anything ND4J related. That grows to be ~2GB.

What I am missing here to analyze the exact behavior over time is a method providing all workspaces for all threads.

For future reference: When I used arbiter, I discovered, I had to invoke scopeOutOfWorkspaces on this preloaded workspace. Apart from that, it seems to work permanently (did some 20h+ training, no hickups).

@raver119: can you comment on the usage for such a scenario?

Thanks!

I also have this requirement, use host RAM to cache some INDArrays because host RAM >> gpu RAM. @sascha08-15 Can you do it now?

1 Like

Have been focusing on other projects. However @raver119 would be good to get an answer to above question. Thanks in advance!