Can't run DL4J with GPU on Linux

I’m building my application on Windows with dependencies: nd4j-cuda-11.2-platform and deeplearning4j-cuda-11.2. I’m creating an executable uber jar as a result and trying to run it on a different OS (Linux). I get the following error:

Warning: Could not load Loader: java.lang.UnsatisfiedLinkError: java.io.IOException: Function not implemented
java.io.IOException: Function not implemented
at sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:94)
at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1072)
at java.nio.channels.FileChannel.lock(FileChannel.java:1053)
at org.bytedeco.javacpp.Loader.cacheResource(Loader.java:647)
at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1582)
at org.bytedeco.javacpp.Loader.load(Loader.java:1328)
at org.bytedeco.javacpp.Loader.load(Loader.java:1132)
at org.bytedeco.cuda.global.cudart.(cudart.java:14)
at org.nd4j.linalg.jcublas.JCublasBackend.canRun(JCublasBackend.java:67)
at org.nd4j.linalg.jcublas.JCublasBackend.isAvailable(JCublasBackend.java:52)
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:160)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5092)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:270)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraphHelper(ModelSerializer.java:506)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:462)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:647)
at org.deeplearning4j.nn.graph.ComputationGraph.load(ComputationGraph.java:4618)
at edu.mit.ll.seamnet.DL4JEnhancementDNN.(DL4JEnhancementDNN.java:54)
at edu.mit.ll.seamnet.EnhancementDNNFactory.newInstance(EnhancementDNNFactory.java:29)
at edu.mit.ll.seamnet.SpeechEnhancement.(SpeechEnhancement.java:46)
at edu.mit.ll.seamnet.SpeechEnhancement.(SpeechEnhancement.java:52)
at edu.mit.ll.seamnet.SpeechEnhancement.main(SpeechEnhancement.java:268)
Exception in thread “main” java.lang.ExceptionInInitializerError
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraphHelper(ModelSerializer.java:506)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:462)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:647)
at org.deeplearning4j.nn.graph.ComputationGraph.load(ComputationGraph.java:4618)
at edu.mit.ll.seamnet.DL4JEnhancementDNN.(DL4JEnhancementDNN.java:54)
at edu.mit.ll.seamnet.EnhancementDNNFactory.newInstance(EnhancementDNNFactory.java:29)
at edu.mit.ll.seamnet.SpeechEnhancement.(SpeechEnhancement.java:46)
at edu.mit.ll.seamnet.SpeechEnhancement.(SpeechEnhancement.java:52)
at edu.mit.ll.seamnet.SpeechEnhancement.main(SpeechEnhancement.java:268)
Caused by: java.lang.RuntimeException: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5095)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:270)
… 9 more
Caused by: org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.linalg.factory.Nd4jBackend.load(Nd4jBackend.java:196)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5092)

I tried everything in https://deeplearning4j.konduit.ai/config/backends and nothing seems to help.

Any thoughts as to why I’m getting this error? Any help will be greatly appreciated as we are trying to finalize our project.

Thank you!

This is quite an uncommon thing to see.

Can you tell us more about the environment? What linux distribution is this? What is the Kernel version? What is the filesystem of the user’s home folder? Have you set any JavaCPP options?

From output of command hostnamectl:
Operating System: GridOS 18.04.5
Kernel: Linux 4.14.235-llgrid-10ms
Architecture: x86-64

I haven’t set any JavaCPP options.

Thank you for your help!

Can you also share the output of df -Th?

I see. Because your home folder is on lustrefs and it doesn’t quite support all typical filesystem features, you get that kind of error here.

You should be able to run your application with -Dorg.bytedeco.javacpp.cachedir=/tmp/javacpp (or point it to wherever you’ve got both a more regular file system and enough disk space).

When running with that option I get a different error:

Exception in thread “main” java.lang.ExceptionInInitializerError
at org.nd4j.jita.concurrency.CudaAffinityManager.getNumberOfDevices(CudaAffinityManager.java:136)
at org.nd4j.jita.constant.ConstantProtector.purgeProtector(ConstantProtector.java:60)
at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:53)
at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:41)
at org.nd4j.jita.constant.ProtectedCudaConstantHandler.(ProtectedCudaConstantHandler.java:69)
at org.nd4j.jita.constant.CudaConstantHandler.(CudaConstantHandler.java:38)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:62)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:56)
at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5152)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5093)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:270)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraphHelper(ModelSerializer.java:506)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:462)
at org.deeplearning4j.util.ModelSerializer.restoreComputationGraph(ModelSerializer.java:647)
at org.deeplearning4j.nn.graph.ComputationGraph.load(ComputationGraph.java:4618)
at edu.mit.ll.seamnet.DL4JEnhancementDNN.(DL4JEnhancementDNN.java:54)
at edu.mit.ll.seamnet.EnhancementDNNFactory.newInstance(EnhancementDNNFactory.java:29)
at edu.mit.ll.seamnet.SpeechEnhancement.(SpeechEnhancement.java:46)
at edu.mit.ll.seamnet.SpeechEnhancement.(SpeechEnhancement.java:52)
at edu.mit.ll.seamnet.SpeechEnhancement.main(SpeechEnhancement.java:268)
Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:116)
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:37)
… 22 more
Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcuda in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
at java.lang.Runtime.loadLibrary0(Runtime.java:871)
at java.lang.System.loadLibrary(System.java:1124)
at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1718)
at org.bytedeco.javacpp.Loader.load(Loader.java:1328)
at org.bytedeco.javacpp.Loader.load(Loader.java:1132)
at org.nd4j.nativeblas.Nd4jCuda.(Nd4jCuda.java:10)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:62)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:56)
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:88)
… 23 more
Caused by: java.lang.UnsatisfiedLinkError: /tmp/javacpp/SeamNet-1.0-bin.jar/org/nd4j/nativeblas/linux-x86_64/libjnind4jcuda.so: libnd4jcuda.so: cannot open shared object file: No such file or directory
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817)
at java.lang.Runtime.load0(Runtime.java:810)
at java.lang.System.load(System.java:1088)
at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1668)
… 31 more

Thoughts?

Now we are getting somewhere. Does the system have cuda installed, or do you have the cuda presets packaged into your uberjar?

I dug deeper once the cache directory was properly set in my system and realized there were CUDA libraries missing. I installed those and the application is now running. Thank you!