20/03/02 05:53:04 WARN Nd4jBackend: Skipped [JCublasBackend] backend (unavailable): java.lang.RuntimeException: No CUDA devices were found in system
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:585)
at ...
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5131)
at org.nd4j.linalg.factory.Nd4j.<clinit>(Nd4j.java:226)
... 5 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:218)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5128)
... 6 more
And i test to find out that: System.getProperties().containsKey(ND4JSystemProperties.DYNAMIC_LOAD_CLASSPATH_PROPERTY) gets false, System.getenv(ND4JEnvironmentVars.BACKEND_DYNAMIC_LOAD_CLASSPATH) gets null.
20/03/03 08:43:48 INFO DefaultOpExecutioner: Backend used: [CUDA]; OS: [Linux]
20/03/03 08:43:48 INFO DefaultOpExecutioner: Cores: [6]; Memory: [10.0GB];
20/03/03 08:43:48 INFO DefaultOpExecutioner: Blas vendor: [CUBLAS]
20/03/03 08:43:48 INFO JCublasBackend: ND4J CUDA build version: 10.2.89
20/03/03 08:43:48 INFO JCublasBackend: CUDA device 0: [GeForce GTX 1660]; cc: [7.5]; Total memory: [6224936960]
...
20/03/03 08:44:02 WARN Dropout: CuDNN execution failed - falling back on built-in implementation
java.lang.RuntimeException: cuDNN status = 8: CUDNN_STATUS_EXECUTION_FAILED
at org.deeplearning4j.nn.layers.BaseCudnnHelper.checkCudnn(BaseCudnnHelper.java:48)
at org.deeplearning4j.nn.layers.dropout.CudnnDropoutHelper.applyDropout(CudnnDropoutHelper.java:188)
at org.deeplearning4j.nn.conf.dropout.Dropout.applyDropout(Dropout.java:173)
at org.deeplearning4j.nn.layers.AbstractLayer.applyDropOutIfNecessary(AbstractLayer.java:295)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:444)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsInWS(ComputationGraph.java:2136)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1373)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1342)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1166)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1116)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1103)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:985)
at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.train(YoloTrainer.java:127)
at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.main(YoloTrainer.java:73)
Exception in thread "main" java.lang.RuntimeException: Failed to allocate 94633984 bytes from DEVICE [0] memory
at org.nd4j.jita.memory.CudaMemoryManager.allocate(CudaMemoryManager.java:78)
at org.nd4j.jita.workspace.CudaWorkspace.alloc(CudaWorkspace.java:224)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:507)
at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:438)
at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.<init>(BaseCudaDataBuffer.java:326)
at org.nd4j.linalg.jcublas.buffer.CudaFloatDataBuffer.<init>(CudaFloatDataBuffer.java:67)
at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.create(CudaDataBufferFactory.java:417)
at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1443)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.createUninitialized(JCublasNDArrayFactory.java:1552)
at org.nd4j.linalg.factory.Nd4j.createUninitialized(Nd4j.java:4339)
at org.nd4j.linalg.workspace.BaseWorkspaceMgr.createUninitialized(BaseWorkspaceMgr.java:270)
at org.deeplearning4j.nn.conf.dropout.Dropout.applyDropout(Dropout.java:200)
at org.deeplearning4j.nn.layers.AbstractLayer.applyDropOutIfNecessary(AbstractLayer.java:295)
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:444)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
at org.deeplearning4j.nn.graph.ComputationGraph.ffToLayerActivationsInWS(ComputationGraph.java:2136)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1373)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1342)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1166)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1116)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1103)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:985)
at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.train(YoloTrainer.java:127)
at com.sefonsoft.tc.ai.cnn.model.yolo.YoloTrainer.main(YoloTrainer.java:73)
Suppressed: java.lang.RuntimeException: Failed to allocate 4049152614 bytes from DEVICE [0] memory
at org.nd4j.jita.memory.CudaMemoryManager.allocate(CudaMemoryManager.java:78)
at org.nd4j.jita.workspace.CudaWorkspace.init(CudaWorkspace.java:92)
at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.initializeWorkspace(Nd4jWorkspace.java:510)
at org.nd4j.linalg.memory.abstracts.Nd4jWorkspace.close(Nd4jWorkspace.java:653)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1423)
... 10 more
Notice that Total memory: [6224936960] is more than 4049152614 bytes .
Everything points at it being just a normal out of memory problem.
This means that it failed to allocate about 4gb of additional memory. So just because this number is less than your total memory, doesnât mean that this amount of memory is available at the point in time when we try to allocate it.