No CUDA devices were found

I get error like

20/03/02 05:53:04 WARN Nd4jBackend: Skipped [JCublasBackend] backend (unavailable): java.lang.RuntimeException: No CUDA devices were found in system
Exception in thread "main" java.lang.ExceptionInInitializerError
	at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:585)
	at ...
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
	at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5131)
	at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:226)
	... 5 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
	at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:218)
	at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5128)
	... 6 more

The dl4j’s version is beta-6.

Part of the pom.xml:

    <dl4j.version>1.0.0-beta6</dl4j.version>
    <cuda.version>10.2</cuda.version>

    		<dependency>
    			<groupId>org.nd4j</groupId>
    			<artifactId>nd4j-cuda-${cuda.version}</artifactId>
    			<version>${dl4j.version}</version>
    		</dependency>
    		<dependency>
    			<groupId>org.nd4j</groupId>
    			<artifactId>nd4j-cuda-${cuda.version}-platform</artifactId>
    			<version>${dl4j.version}</version>
    		</dependency>
    		<dependency>
    			<groupId>org.deeplearning4j</groupId>
    			<artifactId>deeplearning4j-cuda-${cuda.version}</artifactId>
    			<version>${dl4j.version}</version>
    		</dependency>

Server infomation:

uname -r

4.15.0-76-generic

cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  435.21  Sun Aug 25 08:17:57 CDT 2019
GCC version:  gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)

cat /usr/local/cuda/version.txt

CUDA Version 10.2.89

lspci | grep -i nvidia

01:00.0 VGA compatible controller: NVIDIA Corporation Device 2184 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 1aeb (rev a1)
01:00.2 USB controller: NVIDIA Corporation Device 1aec (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device 1aed (rev a1)
# nvidia-smi -L
GPU 0: GeForce GTX 1660 (UUID: GPU-4d72b88e-5d39-6ca0-6432-06a160a1be62)

And i test to find out that:
System.getProperties().containsKey(ND4JSystemProperties.DYNAMIC_LOAD_CLASSPATH_PROPERTY) gets false,
System.getenv(ND4JEnvironmentVars.BACKEND_DYNAMIC_LOAD_CLASSPATH) gets null.

Are you sure you have only 1 CUDA library installed?

I remove and re-install cuda, and it’s the same.

The commands:
apt-get --purge remove “cublas” “cuda*”

dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb

apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub

apt-get update

apt-get install cuda

Looks like you have an old version of the driver installed for some reason. Upgrade that!

It works by
apt-get remove --purge nvidia*
ubuntu-drivers devices
apt-get install nvidia-driver-440
reboot

Thanks :grinning:

1 Like

For now, i get memory error, parts of the logs:

20/03/03 08:43:48 INFO DefaultOpExecutioner: Backend used: [CUDA]; OS: [Linux]
20/03/03 08:43:48 INFO DefaultOpExecutioner: Cores: [6]; Memory: [10.0GB];
20/03/03 08:43:48 INFO DefaultOpExecutioner: Blas vendor: [CUBLAS]
20/03/03 08:43:48 INFO JCublasBackend: ND4J CUDA build version: 10.2.89
20/03/03 08:43:48 INFO JCublasBackend: CUDA device 0: [GeForce GTX 1660]; cc: [7.5]; Total memory: [6224936960]
...
20/03/03 08:44:02 WARN Dropout: CuDNN execution failed - falling back on built-in implementation
java.lang.RuntimeException: cuDNN status = 8: CUDNN_STATUS_EXECUTION_FAILED
	at org.deeplearning4j.nn.layers.BaseCudnnHelper.checkCudnn(BaseCudnnHelper.java:48)
	at org.deeplearning4j.nn.layers.dropout.CudnnDropoutHelper.applyDropout(CudnnDropoutHelper.java:188)
	at org.deeplearning4j.nn.conf.dropout.Dropout.applyDropout(Dropout.java:173)
	at org.deeplearning4j.nn.layers.AbstractLayer.applyDropOutIfNecessary(AbstractLayer.java:295)
	at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:444)
	at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
	at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsInWS(ComputationGraph.java:2136)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1373)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1342)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1166)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1116)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1103)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:985)
	at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.train(YoloTrainer.java:127)
	at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.main(YoloTrainer.java:73)
Exception in thread "main" java.lang.RuntimeException: Failed to allocate 94633984 bytes from DEVICE [0] memory
	at org.nd4j.jita.memory.CudaMemoryManager.allocate(CudaMemoryManager.java:78)
	at org.nd4j.jita.workspace.CudaWorkspace.alloc(CudaWorkspace.java:224)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:507)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:438)
	at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.<init>(BaseCudaDataBuffer.java:326)
	at org.nd4j.linalg.jcublas.buffer.CudaFloatDataBuffer.<init>(CudaFloatDataBuffer.java:67)
	at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.create(CudaDataBufferFactory.java:417)
	at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1443)
	at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.createUninitialized(JCublasNDArrayFactory.java:1552)
	at org.nd4j.linalg.factory.Nd4j.createUninitialized(Nd4j.java:4339)
	at org.nd4j.linalg.workspace.BaseWorkspaceMgr.createUninitialized(BaseWorkspaceMgr.java:270)
	at org.deeplearning4j.nn.conf.dropout.Dropout.applyDropout(Dropout.java:200)
	at org.deeplearning4j.nn.layers.AbstractLayer.applyDropOutIfNecessary(AbstractLayer.java:295)
	at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:444)
	at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
	at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsInWS(ComputationGraph.java:2136)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1373)
	at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1342)
	at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
	at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
	at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
	at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1166)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1116)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1103)
	at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:985)
	at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.train(YoloTrainer.java:127)
	at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.main(YoloTrainer.java:73)
	Suppressed: java.lang.RuntimeException: Failed to allocate 4049152614 bytes from DEVICE [0] memory
		at org.nd4j.jita.memory.CudaMemoryManager.allocate(CudaMemoryManager.java:78)
		at org.nd4j.jita.workspace.CudaWorkspace.init(CudaWorkspace.java:92)
		at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.initializeWorkspace(Nd4jWorkspace.java:510)
		at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.close(Nd4jWorkspace.java:653)
		at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1423)
		... 10 more

Notice that Total memory: [6224936960] is more than 4049152614 bytes .

What is the model you’re trying to run there?

Also, can you please show nvidia-smi output BEFORE you’re launching your NN?

CNN model (YOLOv2).

# nvidia-smi
Tue Mar  3 09:06:45 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1660    On   | 00000000:01:00.0  On |                  N/A |
| 40%   30C    P8    12W / 120W |      0MiB /  5936MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

These methods don’t work:

  1. use ’ -Xmx10g -Xmx10g -Dorg.bytedeco.javacpp.maxbytes=5g ’

  2. add to pom.xml

     <dependency>
     	<groupId>org.bytedeco</groupId>
     	<artifactId>cuda-platform-redist</artifactId>
     	<version>10.2-7.6-1.5.2</version>
     </dependency>

Sure they don’t. Exceptions message says you’re short of GPU memory

I train another NLP model with “Total Parameters: 29,938,877”, it can run.
So the enviroment is all right.

Everything points at it being just a normal out of memory problem.

This means that it failed to allocate about 4gb of additional memory. So just because this number is less than your total memory, doesn’t mean that this amount of memory is available at the point in time when we try to allocate it.

Yes.
It can run when i set batchsize from 16 to 4.

nvidia-smi

Wed Mar 4 01:39:31 2020
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1660 On | 00000000:01:00.0 On | N/A |
| 46% 40C P0 51W / 120W | 5397MiB / 5936MiB | 98% Default |
±------------------------------±---------------------±---------------------+