Hi everyone,
GPU issue:
After training a model with DL4j and using GPU memory to do so, the allocated GPU memory is not freed when the training is done. (The Java app which is doing the training stays alive after training. Only after this application is terminated, the GPU memory is freed). Any ideas why this could be the case? Do we need to do something manually here in order to free this memory? (ps: we are not using UIServer)
Thanks in advance & cheers
Reto
Hi Reto
There’s a few different types of GPU memory that are released at different times.
a) Native libraries - code for various operations, including all the nd4j/libnd4j ops, and libraries like cuBLAS and cuDNN - these are only released when the process is terminated
b) Network parameters - memory here is only released once the network is garbage collected - i.e., there are no references to the network remaining in your code, and the java garbage collector runs.
myNetwork.params().close()
is also an option to do it manually.
c) Workspaces memory (for activations, gradients, etc) - released when a thread is GC’d. For the main thread, you can do Nd4j.getWorkspaceManager().destroyAllWorkspacesForCurrentThread();
Hi Alex, thanks for the lengthy response! Cheers Reto
Hi @AlexBlack,
Does this fonctionment works for CPU ?
Backend used: [CPU]
Cheers.