Running ND4J against CUDA 11.6

chris2 · June 18, 2022, 8:48pm

I am (attempting) to run ND4J against NVIDIA CUDA driver 11.6 (linux x86_64). The latest official release that I can find of ND4J is for driver version 11.2.

What are the odds that I can run the ND4J built for 11.2 against NVIDIA Driver 11.6?

If I can’t do it, is solving this as simple as recompiling ND4J on my machine with NVIDIA Driver 11.6, so it links up with the newer driver?

Thank you, Chris

agibsonccc · June 19, 2022, 1:02pm

@chris2 we’re working on an 11.6 release soon.

For now you’ll have to compile from source. If you’re interested I can help with that.
Otherwise nd4j on cuda 11.2 mainly needs the SDK. The underlying driver and SDK are not necessarily the same thing.

chris2 · June 21, 2022, 1:53am

Yes. I am interested in that! I do not have access to the machines this week, but I should next week.

I have made some progress along these lines. For various reasons, maybe bad ones, I am building the tag 1.0.0-M1.1. The drivers are installed, and we installed and compiled blas, and then got libnd4j to compile. I will provide more details next week. Thank you for offering to help.

agibsonccc · June 21, 2022, 2:09am

@chris2 yeah that’s all you need to do. There’s nothing wrong with using M1.1 I’m just not sure at the time if we supported all of the various flags in cmake for that. If not you can compile against master just fine. Otherwise for now just try that and see what it does.

Usually when we add new cuda versions we have to occasionally update the relevant compilers and versions of things we compile against. Nothing should stop you from doing that as long as you know what to look for.

chris2 · June 30, 2022, 7:10pm

Hello. Actually, I apparently have not succeeded in successfully building the deeplearning4j jar files.

I changed the version of my files using the update-versions.sh shell script to 1.0.0-M1.1-custom. Then in the deeplearning4j root directory, I ran a:
mvn clean install -Djavacpp.platform=linux-x86_64 -DskipTests -DskipTestResourceEnforcement=true

Unfortunately this does not produce the classified nd4j-native and nd4j-native-platform jar files the way I expected it would. It produces an unclassified version, and that unclassified version does not really have anything in the JAR file. It’s nearly empty, containing a pom file and properties file. No classes!

I looked at this for inspiration: deeplearning4j/build-deploy-linux-x86_64.yml at master · eclipse/deeplearning4j · GitHub

Are there more detailed instructions somewhere?

Thank you,
Chris

agibsonccc · June 30, 2022, 11:21pm

@chris2 make sure you add -Pcpu. For cuda you need -Pcuda.

Otherwise the backends are excluded from the build. We do that because when trying to build one backend or the other the compiler or other needed tools may not be present.

chris2 · July 1, 2022, 2:57pm

Ok! Can I build for both at the same time if I do -Pcpu -Pcuda?

agibsonccc · August 16, 2022, 10:55am

@chris2 JFYI release for nd4j-cuda-11.6 for M2.1 is published. Go ahead and give it a try.

chris2 · August 23, 2022, 12:35am

Thanks! I have passed this on to my team, and they’re working the task. We look forward to seeing it run faster.

Topic		Replies	Views
What modules do we factually need to build on the newest CUDA version 11.1? ND4J	3	406	February 20, 2021
Compiling deeplearning4j with -Pcuda does not produce cuda artifacts	5	281	July 11, 2022
Mvn commamd built nd4j-native but not nd4j-cuda, what is the cause? ND4J	5	467	February 27, 2021
About CUDA support for 1.0.0-M2	8	338	May 9, 2022
Nd4j-cuda-11.2-platform not available for m2 ND4J	9	503	July 4, 2022

Running ND4J against CUDA 11.6

Related topics