Only two threads are running when switching training to GPU

jin · May 29, 2020, 6:20am

I have a box with 12 CPU cores (24 threads with HT) and two Nvidia GeForce RTX 2080 cards. I am using DL4J Beta 6. I have a simple sequential model with 32 nodes in the first hidden layer and 16 nodes in the 2nd hidden layer. When I train it with CPU only, all 24 threads are running. But when I switch to GPU, I only see two threads are running and the progress is much slower than the CPU training. I tried all the diagnosis steps here:
https://deeplearning4j.konduit.ai/config/backends/performance-issues
but still could not figure why it is so slow. The GPU utilization is only about 6% on one of them (the other one is 0%). Any other possible thing that I might have missed? My OS is Ubuntu 18.04. Thanks for pointers.

treo · May 29, 2020, 7:00am

Your model is very small, so almost all time is just spent on shuffling data between GPU and CPU.

When you have a model that is that small using the CPU is usually more efficient.

Also, multiple GPU’s will only be used if you are using a parallel wrapper for training
e.g.: https://github.com/KonduitAI/deeplearning4j-examples/blob/master/dl4j-cuda-specific-examples/src/main/java/org/deeplearning4j/examples/multigpu/MultiGpuLenetMnistExample.java#L107-L115

In any case, ND4J will only use one Backend at a time, so you can’t train on both CPU and GPU at the same time within the same process.

jin · May 29, 2020, 5:50pm

Paul,

Thanks for the insights. It makes sense. I also found out that my application level driver code was not invoking the GPU training properly. I was sending too many concurrent training requests to the GPU. It turned out only the first two requests survived. The other requests ran out memory with the GPU. So only two threads end up running from the top level. Plus the overhead that you mentioned in moving data between CPU and GPU, it made the training on GPU slower. I’ll stay with CPU training with small model. Thank you.

Topic		Replies	Views
GPU Error between epochs ND4J	3	502	October 15, 2020
Using multiple backends DL4J	1	307	September 20, 2021
It seems CPU is utilized, though BACKEND_PRIORITY_GPU=10 and BACKEND_PRIORITY_CPU=1 Tuning Help	2	326	May 8, 2023
25% GPU Usage on 1080 ti DL4J	9	587	February 14, 2022
A good method of preventing overfitting using MutiGPUWrapper DL4J	8	39	August 22, 2024

Only two threads are running when switching training to GPU

Related topics