Running ND4J against CUDA 11.6

I am (attempting) to run ND4J against NVIDIA CUDA driver 11.6 (linux x86_64). The latest official release that I can find of ND4J is for driver version 11.2.

What are the odds that I can run the ND4J built for 11.2 against NVIDIA Driver 11.6?

If I can’t do it, is solving this as simple as recompiling ND4J on my machine with NVIDIA Driver 11.6, so it links up with the newer driver?

Thank you, Chris

@chris2 we’re working on an 11.6 release soon.

For now you’ll have to compile from source. If you’re interested I can help with that.
Otherwise nd4j on cuda 11.2 mainly needs the SDK. The underlying driver and SDK are not necessarily the same thing.

Yes. I am interested in that! I do not have access to the machines this week, but I should next week.

I have made some progress along these lines. For various reasons, maybe bad ones, I am building the tag 1.0.0-M1.1. The drivers are installed, and we installed and compiled blas, and then got libnd4j to compile. I will provide more details next week. Thank you for offering to help.

@chris2 yeah that’s all you need to do. There’s nothing wrong with using M1.1 I’m just not sure at the time if we supported all of the various flags in cmake for that. If not you can compile against master just fine. Otherwise for now just try that and see what it does.

Usually when we add new cuda versions we have to occasionally update the relevant compilers and versions of things we compile against. Nothing should stop you from doing that as long as you know what to look for.

Hello. Actually, I apparently have not succeeded in successfully building the deeplearning4j jar files.

I changed the version of my files using the update-versions.sh shell script to 1.0.0-M1.1-custom. Then in the deeplearning4j root directory, I ran a:
mvn clean install -Djavacpp.platform=linux-x86_64 -DskipTests -DskipTestResourceEnforcement=true

Unfortunately this does not produce the classified nd4j-native and nd4j-native-platform jar files the way I expected it would. It produces an unclassified version, and that unclassified version does not really have anything in the JAR file. It’s nearly empty, containing a pom file and properties file. No classes!

I looked at this for inspiration: deeplearning4j/build-deploy-linux-x86_64.yml at master · eclipse/deeplearning4j · GitHub

Are there more detailed instructions somewhere?

Thank you,
Chris

@chris2 make sure you add -Pcpu. For cuda you need -Pcuda.

Otherwise the backends are excluded from the build. We do that because when trying to build one backend or the other the compiler or other needed tools may not be present.

Ok! Can I build for both at the same time if I do -Pcpu -Pcuda?

@chris2 JFYI release for nd4j-cuda-11.6 for M2.1 is published. Go ahead and give it a try.

Thanks! I have passed this on to my team, and they’re working the task. We look forward to seeing it run faster.