Fails to create cuBLAS handle

Trying to run some examples on a multi-GPU setup using cuda 10.1. The output I get is basically

o.n.l.f.Nd4jBackend - Loaded [JCublasBackend] backend
o.n.n.NativeOpsHolder - Number of threads used for linear algebra: 32
o.n.l.a.o.e.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Linux]
o.n.l.a.o.e.DefaultOpExecutioner - Cores: [48]; Memory: [30.0GB];
o.n.l.a.o.e.DefaultOpExecutioner - Blas vendor: [CUBLAS]
o.n.l.j.JCublasBackend - ND4J CUDA build version: 10.1.243
o.n.l.j.JCublasBackend - CUDA device 0: [Tesla P100-PCIE-16GB]; cc: [6.0]; Total memory: [17071734784]
o.n.l.j.JCublasBackend - CUDA device 1: [Tesla P100-PCIE-16GB]; cc: [6.0]; Total memory: [0]
o.d.n.m.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
Exception in thread "main" java.lang.RuntimeException: cuBLAS handle creation failed !; Error code: [1]
        at org.nd4j.nativeblas.Nd4jCuda.lcBlasHandle(Native Method)
        at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaCublasHandle(CudaZeroHandler.java:1016)
        at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaContext(CudaZeroHandler.java:1041)
        at org.nd4j.jita.handler.impl.CudaZeroHandler.getDeviceContext(CudaZeroHandler.java:1003)
        at org.nd4j.jita.allocator.impl.AtomicAllocator.getDeviceContext(AtomicAllocator.java:218)
        at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.initPointers(BaseCudaDataBuffer.java:393)
        at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.<init>(BaseCudaDataBuffer.java:405)
        at org.nd4j.linalg.jcublas.buffer.CudaFloatDataBuffer.<init>(CudaFloatDataBuffer.java:68)
        at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.create(CudaDataBufferFactory.java:376)
        at org.nd4j.linalg.factory.Nd4j.createBuffer(Nd4j.java:1466)
        at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1524)
        at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1519)
        at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:4298)
        at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3986)
        at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:693)
        at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.init(MultiLayerNetwork.java:609)
        at org.deeplearning4j.examples.recurrent.character.LSTMCharModellingExample.main(LSTMCharModellingExample.java:98)

I’ve tried specifying to only use GPU 0 using
CudaEnvironment.getInstance().getConfiguration().allowMultiGPU(false).allowCrossDeviceAccess(false).useDevice(0);
since it looks more available than GPU 1, but the error still persists.

Any ideas how I could go about solving it?

You can try setting the CUDA_​VISIBLE_​DEVICES environment variable:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars

1 Like

Thanks @treo, I sat it using export CUDA_VISIBLE_DEVICES=0 and it worked, the example I was trying to run now runs without any problem. However, when trying to run a model of mine i get a _reductionPointer allocation failure as such

Caused by: java.lang.RuntimeException: _reductionPointer allocation failed; Error code: [2]
	at org.nd4j.nativeblas.Nd4jCuda.lcScalarPointer(Native Method)
	at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaContext(CudaZeroHandler.java:1210)
	at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:225)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:526)
	at org.nd4j.jita.allocator.impl.AtomicAllocator.allocateMemory(AtomicAllocator.java:438)
	at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.initPointers(BaseCudaDataBuffer.java:297)
	at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.<init>(BaseCudaDataBuffer.java:316)
	at org.nd4j.linalg.jcublas.buffer.CudaLongDataBuffer.<init>(CudaLongDataBuffer.java:87)
	at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createLong(CudaDataBufferFactory.java:1119)
	at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createLong(CudaDataBufferFactory.java:1114)
	at org.nd4j.linalg.factory.Nd4j.createBufferDetachedImpl(Nd4j.java:1296)
	at org.nd4j.linalg.factory.Nd4j.createBufferDetached(Nd4j.java:1278)
	at org.nd4j.linalg.factory.Nd4j.read(Nd4j.java:2552)
	at org.deeplearning4j.util.ModelSerializer.restoreMultiLayerNetwork(ModelSerializer.java:256)
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.load(MultiLayerNetwork.java:3822)
	at org.apache.asterix.external.library.LSTMStreamedDataSentimentFunction.initialize(LSTMStreamedDataSentimentFunction.java:136)
	at org.apache.asterix.external.library.ExternalFunction.initialize(ExternalFunction.java:94)
	at org.apache.asterix.external.library.ExternalScalarFunction.<init>(ExternalFunctionProvider.java:58)
	... 11 more

Does this mean that the single GPU I am running on fails to provide enough memory for my model? The error happens during a call to MultiLayerNetwork.load() on a single-LSTM based model with an embedding layer.

@treo I just saw your answer to my other topic, and indeed, while the example i ran above used beta7 the LSTM-based model that provoked the _reductionPointer error was using beta6. I’ll upgrade and come back here with the results.

@treo Are models built with beta6 possible to load in beta7 though? Now I am getting a no class def found error

Caused by: java.lang.NoClassDefFoundError: org/bytedeco/javacpp/indexer/UIntIndexer
	at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createLong(CudaDataBufferFactory.java:1071) ~[nd4j-cuda-10.1-1.0.0-beta7.jar:?]
	at org.nd4j.linalg.jcublas.buffer.factory.CudaDataBufferFactory.createLong(CudaDataBufferFactory.java:1066) ~[nd4j-cuda-10.1-1.0.0-beta7.jar:?]
	at org.nd4j.linalg.factory.Nd4j.createBufferDetachedImpl(Nd4j.java:1336) ~[nd4j-api-1.0.0-beta7.jar:1.0.0-beta7]
	at org.nd4j.linalg.factory.Nd4j.createBufferDetached(Nd4j.java:1318) ~[nd4j-api-1.0.0-beta7.jar:1.0.0-beta7]
	at org.nd4j.linalg.factory.Nd4j.read(Nd4j.java:2575) ~[nd4j-api-1.0.0-beta7.jar:1.0.0-beta7]
	at org.deeplearning4j.util.ModelSerializer.restoreMultiLayerNetworkHelper(ModelSerializer.java:283) ~[deeplearning4j-nn-1.0.0-beta7.jar:?]
	at org.deeplearning4j.util.ModelSerializer.restoreMultiLayerNetwork(ModelSerializer.java:238) ~[deeplearning4j-nn-1.0.0-beta7.jar:?]
	at org.deeplearning4j.util.ModelSerializer.restoreMultiLayerNetwork(ModelSerializer.java:222) ~[deeplearning4j-nn-1.0.0-beta7.jar:?]
	at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.load(MultiLayerNetwork.java:3839) ~[deeplearning4j-nn-1.0.0-beta7.jar:?]
	at org.apache.asterix.external.library.LSTMStreamedDataSentimentFunction.initialize(LSTMStreamedDataSentimentFunction.java:133) ~[?:?]

Could it be that I messed up my dependencies for javacpp or something? I basically did a replace-all on beta6 for beta7.

Yes, that looks like a problem with javacpp dependencies. Have you specified anything about it manually?

Sorry I’m getting back to you so late. Yes, I mistakenly specified the wrong javacpp version, everything works like a charm now!