CPU vs GPU difference


i have a question about “result” of training in CPU vs GPU on 2 very different host (osx vs jetson)

i have some very strange behaviour, i run exactly the same code, exactly the same input on the 2 host, what happen on OSX is that the train progress (in terms of accuracy & precision) constantly till he reach 0.98 for both (more or less)
and my custom logs report

score improvement from 49.34313231650931 to 49.432347998504916
score improvement from 49.432347998504916 to 49.52178271262818
score improvement from 49.52178271262818 to 49.96684361616423

and so on

on jetson the first iteration report improvement, but then the score didn’t raise anymore, no matter if i leave it for 2 hrs or 1 minute.

on OSX every iteraton make score “change” (sometimes better sometimes worst) on jetson only the very first iteration make things change, then it stay exactly the same.

i don’t think the issue can be how i configured the newtork (or how i normalize the data), since the code it’s exactly the same (same for dataset, they are equals) so it must be something with native implementation probably? maybe lack of precision or loss of information using cuda?

Anybody ever experienced that?

i did one more test, i try to start it in CPU mode on jetson and that’s what i got

java.lang.IllegalStateException: Cannot perform evaluation with NaNs present in predictions: 8625 NaNs present in predictions INDArray

so, it’s again a different behaviour.

basically 3 different behaviour
osx → code work
jetson+cuda → train don’t proceed after 1 step (same score no improvement)
jetson+cpu → train proceed, score increase, but after a while it report NaN(s)