When I used multiple GPUs to train VGG, I increased the batch size, but I found that the program would have errors and show insufficient memory. However, these memory is sufficient on tensorflow. The pictures below show the errors
"This happens on tensorflow and “this happens in dl4j” doesn’t really help us in anyway. We don’t know what you were using or what the context was.
Could you give us something we can run to compare for ourselves?
Right I know what your error is. I want your code you were using in tensorflow to compare. Give me 2 side by side runnable scripts you actually ran yourself and I can tell you more.
Thanks for your reply. This is part of the code。 I think it may be that the memory allocation method is different during the training process, but I don’t know how dl4j allocates memory in the multi-GPU training era.
Ah, parallelwrapper has mutliple copies of the model in memory. That’s why. ParallelWrapper is mainly for multi GPU. Go ahead and just use singular dl4j and you should be fine.
Well.I am sorry. I can’t understand where I need to modify the code specifically .I use two GPUs to train vgg network. So if I can’t use the parallelwrapper,how can I train vgg for multiGPU with large batch size?
Oh I see that was on purpose. Sorry I missed that in your post.
Ensure your number of workers equals number of gpus and adjust your prefetch buffer to be less. That might be what’s taking up your memory.
I tried it but failed again. So the parallelwrapper’s
methods of allocating memory may be different from others. Any other ways to take up memory?