Perfomance issue

SaviorD7 · September 29, 2020, 6:35pm

Hello everyone!

I made some NN and did some experiments.

I ran it on my local machine (4 cores, 8 threads, 8 GB memory + SSD) and there are perfomance log and results:
Time: about 50 minutes
Accuracy: 83,7 %

Next I ran the same NN in Google Cloud on VM with 1 core; 6,5 GB memory:
Time: about 40 minutes
Accuracy : the same 83,7%
SCREENSHOTS (DARK THEME IS LOCAL MACHINE / WHITE IS VIRTUAL IN GC)
So the question is: why is the differents between time training , moreover the result is better where perfomance is worse.

In addition: how I can increase perfomance , because the same NN in TF make training in 10 minutes what is in 4 times better that dl4j.
Maybe I can turn on more threads? Cause I have 8 but used only 4 or what ?
Really do not understand this differents

agibsonccc · September 29, 2020, 9:32pm

If we can’t run it ourselves, we can’t tell you why it’s faster/slower
Could you give us the complete information? We’d need to be able to setup tensorflow as well.
Imagine everything we’d need to run a benchmark ourselves, the 2 projects side by side, the versions of everything you used, what OSes you used for each as well as how you ran each one.

Screenshots from some logs and “tensorflow was faster and it was roughly something similar” doesn’t really help us determine much.

SaviorD7 · September 29, 2020, 9:52pm

Let’s leave information about TF now.

Whats worng with two examples on different machines?
What also information you need to answer ?
NN is absolutely same.
VM has less perfomance but get the best result…why?

agibsonccc · September 29, 2020, 10:03pm

@SaviorD7 I need literally everything you used to run it:

Source code
Java command you use to run it.
If you have it, a similar dataset not the exact one

If I can’t reproduce it I can’t really help you. What you’re essentially asking me to do is download your code, run it myself, run instrumentation tools to identify the bottlenecks and compare them.

I’m not going to guess, I’m going to measure and verify. After that I can offer suggestions and see what differences might affect your code running.

If you want, you can also add in: deeplearning4j/PerformanceListener.java at master · eclipse/deeplearning4j · GitHub

to measure performance and compare them as well.

As for the machines, it’d be nice to know which size of VM you had on GCP as well.

Again, I can’t answer much of “why” without running it myself.

SaviorD7 · September 30, 2020, 4:37pm

So,

Source code: /* ***************************************************************************** - Pastebin.com
What exaclty you need? Using: C:\Program Files\Java\jdk-14.0.2\bin\java.exe"
Dataset: data - Google Drive

agibsonccc · September 30, 2020, 9:28pm

@SaviorD7 beautiful thanks let me take a look. I’ll try to figure out what’s going on.

Edit: @SaviorD7 could you tell me the exact command you used like

java -cp somejar.jar your.main.class

I need to know that to know how to run your program. One big issue you’re likely running in to is memory constraints. Sometimes java sets default heap sizes that constrain performance.
One thing most people don’t realize is python isn’t afraid to infinitely expand the memory usage of a program till the computer dies.
Java’s GC constrains itself and that causes performance regressions rather than killing a machine.

Honestly though, this is just a guess. Without me being able to run things for myself I can’t really help you.

You also said “put tensorflow aside for a moment” actually it’d be really helpful to run something side by side. If something is slower, we should know about it and at least have it be a known issue.

SaviorD7 · October 2, 2020, 12:13am

My shorten coomand is:

About TF, I made the same NN in TF and dl4j. Results in dl4j you can see above.
Also NN in TF trained about 4 times faster (10 - 15, max 20 minutes).

agibsonccc · October 2, 2020, 1:12pm

Right but do you have a specific example for me? I’d still prefer to run it myself for comparison.

agibsonccc · October 2, 2020, 1:38pm

@SaviorD7 so playing with your script a bit…is there any reason your batch size was so small? I still highly doubt you were running something equivalent. I changed the batch size to 1000 and it finished pretty quick. You have more than enough data to increase the batch size. There’s zero reason to have it at 25.
I’d really like to see your TF script. Also, please use the performance listener.
Here’s my modified version that does that: gist:d232723ef36d8cb425c58c8750dec50e · GitHub

Topic		Replies	Views
Low accuracy compared to model trained with Keras DL4J	8	772	August 21, 2020
Forward pass has very high execution times DL4J	4	215	March 5, 2023
Performance analysis of DL4J-examples	2	242	April 9, 2023
Lower accuracy for a simple model trained by DL4J than Keras DL4J	8	568	October 14, 2020
Linux performance issues in highly threaded environment DL4J	1	337	July 29, 2021

Perfomance issue

Related topics