AMD Ryzen 5000 CPU - Poor Performance

I recently upgraded from an Intel 4790k CPU to a new AMD Ryzen 5700G CPU. I was expecting large performance improvements with training neural nets. To my disappointment, instead of being faster I see training taking about 20x longer! (this is against my 7 year old Intel CPU with half the cores). I’m shocked and disappointed.

I tried to solve this problem. I went to the CPU optimization page and I turned on AVX2. That optimization had little effect. I then read the section about Intel MKL. I never had the system property set in my code, as described in the instructions, but was it running automatically when I was using the Intel CPU?

Is the Intel MLK library a giant source of optimization? Do AMD CPUs support the MKL library?

Is there anything I can do to get the AMD Ryzen CPUs to be fast or should I just return it and buy another Intel CPU?

I’m running 1.0.0-beta7 on Ubuntu 16.04, by the way.

@daviddbal feel free to use onednn like on the page, generally AMD cpus will have issues with MKL.
Beyond that, take a look at this issue: AVX2 brings no performance improvement? · Issue #9417 · eclipse/deeplearning4j · GitHub
We’re going to do some more in depth profiling to identify the issues. It’s likely related to the new build machines on github actions we’re using.

I tried to use linux-x86_64-onednn-avx2 with 1.0.0-M1 and 1.0.0-M1.1 and both cause a ExceptionInInitializerError:

Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:116)
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:37)
… 57 more
Caused by: java.lang.NoClassDefFoundError: org/nd4j/common/config/ND4JClassLoading

@daviddbal I have a feeling you’re not using the pom right. You need both the non classifier version (which has the class files) and the classifier version (with the actual platform specific binaries) with the same group id, artifact id, version.

If including both doesn’t work, can you include your full pom.xml?

I’m using gradle instead of maven. Here are the all the dependencies related to dl4j:

ext {
    dl4j_version = "1.0.0-M1.1"
    kotlin_version = "1.3.30"
}

compile "org.datavec:datavec-api:${dl4j_version}"
compile "org.datavec:datavec-spark_2.11:${dl4j_version}"
compile "org.datavec:datavec-local:${dl4j_version}"
compile group: 'org.apache.spark', name: 'spark-core_2.11', version:'2.4.3'
compile group: 'io.netty', name: 'netty-all', version:'4.1.42.Final'
implementation "org.deeplearning4j:deeplearning4j-core:${dl4j_version}"
implementation "org.nd4j:nd4j-native-platform:${dl4j_version}"
compile "org.nd4j:nd4j-api:${dl4j_version}"
compile "org.nd4j:nd4j-native:${dl4j_version}:linux-x86_64-avx2"

// compile “org.nd4j:nd4j-native:${dl4j_version}:linux-x86_64-onednn-avx2”
compile “org.deeplearning4j:deeplearning4j-nlp:${dl4j_version}”

Now I’m seeing this error:
Caused by: java.lang.UnsatisfiedLinkError: /home/bal/.javacpp/cache/nd4j-native-1.0.0-M1.1-linux-x86_64.jar/org/nd4j/nativeblas/linux-x86_64/libjnind4jcpu.so: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.22’ not found (required by /home/bal/.javacpp/cache/nd4j-native-1.0.0-M1.1-linux-x86_64.jar/org/nd4j/nativeblas/linux-x86_64/libnd4jcpu.so)

@daviddbal ah yeah with gradle you need to specify both non classifier and specific classifier. I’m not really sure how platform interacts with gradle, but I would specify everything to be on the safe side.
You’re still missing the one without the classifier. The classifiers just have the native artifacts in them.

Also, please avoid having multiple classifiers like that on the classpath to void more sources of errors.

Regarding glibc, which OS are you running? You might be restricted to just using the compat classifier. We introduced that (which builds on centos 6) for people with older machines.

I am using Ubuntu 16.04.

When you mention “platform” are you referring to “nd4j-native-platform?” That artifact is needed as a dependency for org.nd4j.linalg.cpu.nativecpu.NDArray.

Are you telling me that each dependency requires a classifier not just nd4j-native?

@daviddbal Could you clarify what gave you that impression? The core dependency for that is nd4j-api.
The nd4j-native/nd4j-cuda artifacts have that as a dependency already.

All nd4j-nattive-platform does is gives you a simple way of specifying one dependency for all different platforms. If you actually look at the nd4j-native-platform, pom.xml you would see that.

Beyond that…not really sure what to say here. I already explained what each one does.

The one without the classifier has the class files in there and the classifiers themselves just contain the native artifacts compiled a specific way (like for cpu only or gpu or cpu with avx)

In that case, you will need one artifact that just has the class files and another with the artifacts for a specific platform that you want to use.

In our tutorials, maven typically does this for you. It will automatically pull the relevant classifier + the default classifier for you. Gradle does not do this.

Edit:
@daviddbal if you’re wondering why we have this system it’s because all of the binaries are only able to be loaded once by any JVM runtime. Once you load a library in to memory it can’t load them again.
All of the relevant classifiers are artifacts with the same name but compiled a different way.

All of those are based on the same c++ code base underneath: deeplearning4j/libnd4j at master · eclipse/deeplearning4j · GitHub

Hopefully that helps you understand a bit more about what’s going on there.

To recap: the non classifier one has the actual java files you see, the classifier ones contain binaries compiled a certain way. That’s our way of allowing the user to specify “I want to use math libraries compiled this way” - I know it’s complicated, but people have different platforms they want to run on as well as different requirements like binary size as well as target OS and platform.

Yes, your explanation helps me understand how the system works and why it’s built the way it is. However, I still don’t know which artifacts require classifiers and which do not.

In this tutorial:

I only see nd4j-native with a classifier.

Also, I didn’t copy and paste from my buildgradle file correctly above. I have both the non-classifier and classifier versions of nd4j-native.

compile "org.nd4j:nd4j-native:${dl4j_version}"
compile "org.nd4j:nd4j-native:${dl4j_version}:${dl4j_classifier}"

Also, if I use 1.0.0-beta7 as my dl4j_version it builds and runs correctly. I only get errors with 1.0.0_M1 and 1.0.0_M1.1

@daviddbal I"ve explained it multiple times. I’m not really sure what more to say on the subject. You include the non classifier version and the classifier version. You have it right there.

If that’s still failing on your setup due to the glibc issue consider using compat:
https://repo1.maven.org/maven2/org/nd4j/nd4j-native/1.0.0-M1.1/
https://repo1.maven.org/maven2/org/nd4j/nd4j-native/1.0.0-M1.1/nd4j-native-1.0.0-M1.1-linux-x86_64-compat.jar

You have the correct setup, now it’s just about finding which ones run on ubuntu 16. We do build all of the various classifiers on ubuntu 16 though. Failing that either way, compat should work if the other classifiers do not.

Keep the non classifier one and just have one classifier (the binaries you want to use) in your build.gradle

Sorry for the confusion, I know exposing all of this comes with a certain set of complexity. Generally we just recommend people use the defaults and not think about it too much.

If you aren’t seeing many benefits, then we can see what happens after we run some benchmarks. I’ve also commented on your other issues here: AVX2 brings no performance improvement? · Issue #9417 · eclipse/deeplearning4j · GitHub

Well, the only thing you haven’t said even once is if the only dependency that requires the classifier is nd4j-native, but I think it’s implied in your last message. Honestly, while I appreciate your help, I would like more clarity and less condescending language.

Is there a working example with M1.0.0-M1.1?

I can convert the maven pom into a gradle build file easy enough.

@daviddbal sorry for the frustration. For gradle, you just need the 2 lines you already mentioned.
Here’s an example build.gradle I just tested:

plugins {
    id 'java'
}

group 'org.nd4j'
version '1.0-SNAPSHOT'

repositories {
    mavenCentral()
}

dependencies {
    testCompile group: 'junit', name: 'junit', version: '4.12'
    compile group: 'org.nd4j', name: 'nd4j-native', version: '1.0.0-M1.1'
    compile group: 'org.nd4j', name: 'nd4j-native', version: '1.0.0-M1.1', classifier: 'windows-x86_64'
    compile group: 'org.bytedeco', name: 'openblas', version: '0.3.17-1.5.6'
    compile group: 'org.bytedeco', name: 'openblas', version: '0.3.17-1.5.6', classifier: 'windows-x86_64'

}

It includes the necessary dependencies including openblas for you.
Just change windows-x86_64 to linux-x86_64 or whatever classifier you would like.

On AMD, I would recommend staying away from any of the mkl dependencies like onednn as it does have built in problems on amd cpus. This is a fairly well known problem in the community.

Regarding my earlier comments, I apologize for that as well. I was more trying to express that I’m not really sure what more to say about the situation. I only have one recommendation for this. The build.gradle should hopefully communicate that better.

@daviddbal I added more context about the issues on AMD processors to the github issue. We’ll run some tests and add a transparent fix for users so this doesn’t come up in the future. Thanks again for reporting!

Thank you for your new comments and the example build.gradle. That clearly shows which dependencies are needed and answers all my questions.

Do you have much hope that the performance problem I’m facing with the AMD CPU will be fixed any time soon? The return window for my new CPU ends in 22 days. If the problem is likely to persist for a while I should return it and reassess after Intel’s new Alder Lake CPUs come out in a few months.

@daviddbal we’ll profile on our own internal AMD boxes and get back to you. Worse case scenario you just use snapshots when/if we deploy a fix for this. Setting the environment variable mentioned in the github issue might be worth a shot.
If you could do that and report your results as well, we’d appreciate it. I’ve pushed up our own benchmark toolkit that already has JMH setup as well: Add benchmark to contrib by agibsonccc · Pull Request #9421 · eclipse/deeplearning4j · GitHub

this is what we’ll be running on our own with different nd4j classifiers specified to see the differences in performance.

I set the environment variable:

System.setProperty(“org.bytedeco.openblas.load”, “mkl”);

It didn’t improve performance on the AMD computer. This isn’t surprising as the MKL library cripples AMD CPUs (which I’ve been learning more about recently).

Are you asking me to run a benchmark test as well? I’m not sure why you mentioned the pull request.