NaNs present in prediction

(copied from GitHub NaNs present in predictions INDArray · Issue #8496 · eclipse/deeplearning4j · GitHub)
Hi, i’m learning deeplearnig4j and tried using Groovy 3.0.7 to train a RNN for regression that takes as input series of “composite” values (4 doubles) and i wish to predict the next value (next 4 doubles). I stumbled in this error but i don’t know if i’m doing something wrong or if it is the same bug of this issue:

2021-03-21 18:58:36.479 INFO 7402 — [ main] org.nd4j.linalg.factory.Nd4jBackend : Loaded [JCublasBackend] backend
2021-03-21 18:58:38.044 INFO 7402 — [ main] org.nd4j.nativeblas.NativeOpsHolder : Number of threads used for linear algebra: 32
2021-03-21 18:58:38.066 INFO 7402 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Backend used: [CUDA]; OS: [Linux]
2021-03-21 18:58:38.066 INFO 7402 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Cores: [8]; Memory: [5,9GB];
2021-03-21 18:58:38.066 INFO 7402 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Blas vendor: [CUBLAS]
2021-03-21 18:58:38.073 INFO 7402 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : ND4J CUDA build version: 10.2.89
2021-03-21 18:58:38.074 INFO 7402 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : CUDA device 0: [GeForce GTX 770]; cc: [3.0]; Total memory: [2095710208]
2021-03-21 18:58:38.106 INFO 7402 — [ main] o.d.nn.multilayer.MultiLayerNetwork : Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
2021-03-21 18:58:39.105 INFO 7402 — [ main] i.q.b.p.PricePredictionTraining : Network Training
2021-03-21 18:58:39.984 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 0 is 254339.3
2021-03-21 18:58:40.937 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 1 is 534874.65
2021-03-21 18:58:42.204 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 2 is 1822974.8
2021-03-21 18:58:42.550 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 3 is 520241.4
2021-03-21 18:58:43.041 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 4 is 493091.85
2021-03-21 18:58:43.237 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 5 is 146476.6375
2021-03-21 18:58:43.421 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 6 is 239147.6
2021-03-21 18:58:43.627 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 7 is 368370.975
2021-03-21 18:58:44.543 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 8 is 667049.2
2021-03-21 18:58:45.455 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 9 is 1610134.4
2021-03-21 18:58:46.028 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 10 is 5576194.0
2021-03-21 18:58:46.347 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 11 is 461675.9
2021-03-21 18:58:46.678 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 12 is 343023.0
2021-03-21 18:58:47.408 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 13 is 1121107.9
2021-03-21 18:58:48.307 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 14 is 695255.3
2021-03-21 18:58:48.915 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 15 is 938870.2
2021-03-21 18:58:49.351 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 16 is 1045590.5
2021-03-21 18:58:49.664 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 17 is 201749.4375
2021-03-21 18:58:50.572 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 18 is 9143387.2
2021-03-21 18:58:51.147 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 19 is 1.333519872E8
2021-03-21 18:58:51.721 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 20 is 1201941.6
2021-03-21 18:58:51.991 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 21 is 1184513.6
2021-03-21 18:58:52.603 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 22 is 1526626.7
2021-03-21 18:58:53.460 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 23 is 1333823.9
2021-03-21 18:58:54.149 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 24 is 911977.0
2021-03-21 18:58:54.672 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 25 is 238359.65
2021-03-21 18:58:55.435 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 26 is 1787077.6
2021-03-21 18:58:56.050 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 27 is 1045883.3
2021-03-21 18:58:56.897 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 28 is 1289983.6
2021-03-21 18:58:57.800 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 29 is 794034.05
2021-03-21 18:58:57.970 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 30 is 434461.96875
2021-03-21 18:58:57.970 INFO 7402 — [ main] o.d.o.listeners.EvaluativeListener : Starting evaluation nr. 1
2021-03-21 18:58:58.859 INFO 7402 — [ main] ConditionEvaluationReportLoggingListener :

Caused by: java.lang.IllegalStateException: Cannot perform evaluation with NaNs present in predictions: 2147483647 NaNs present in predictions INDArray
at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:641) ~[nd4j-common-1.0.0-beta7.jar:na]
at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:286) ~[nd4j-common-1.0.0-beta7.jar:na]
at org.nd4j.evaluation.classification.Evaluation.eval(Evaluation.java:403) ~[nd4j-api-1.0.0-beta7.jar:1.0.0-beta7]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluationHelper(MultiLayerNetwork.java:3453) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluation(MultiLayerNetwork.java:3400) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.optimize.listeners.EvaluativeListener.invokeListener(EvaluativeListener.java:236) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.optimize.listeners.EvaluativeListener.onEpochEnd(EvaluativeListener.java:213) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1727) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1636) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1623) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork$fit.call(Unknown Source) ~[na:na]
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) ~[groovy-3.0.7.jar:3.0.7]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) ~[groovy-3.0.7.jar:3.0.7]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148) ~[groovy-3.0.7.jar:3.0.7]

I’ve built a simple SequenceRecordReaderDataSetIterator

def trainData = new SequenceRecordReaderDataSetIterator(
            new InMemorySequenceRecordReader(sequences),
            5,
            4,
            4,
            true
        )

where sequences is a list of all the timeseries used for training (each record is composed of 8 values, 4 for the “current” timestep and the last 4 are the “next” timestep value)

and i build the network like this

def conf = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.XAVIER)
            .updater(new Nadam())
            .list()
            .layer(
                new LSTM.Builder()
                    .activation(Activation.TANH)
                    .nIn(4).nOut(50)
                    .build()
            )
            .layer(
                new RnnOutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.MCXENT)
                    .activation(Activation.SOFTMAX)
                    .nIn(50).nOut(4)
                    .build()
            )
            .build()

def net = new MultiLayerNetwork(conf)
net.init()

net.setListeners(new ScoreIterationListener(1), new EvaluativeListener(testData, 1, InvocationType.EPOCH_END))

net.fit(trainData, 4)

Am i doing something wrong? Is there a workaround for this until the bug is officially resolved?

I’m using gradle for the build and theese are my dependencies
implementation ‘org.codehaus.groovy:groovy’
implementation ‘org.deeplearning4j:deeplearning4j-core:1.0.0-beta7’
implementation ‘org.deeplearning4j:deeplearning4j-cuda-10.2:1.0.0-beta7’
implementation ‘org.nd4j:nd4j-cuda-10.2-platform:1.0.0-beta7’
DL

The error tells you pretty much what your problem is: You’ve got NaN predictions.

Looking at the scores you are getting during training, I guess your inputs are probably not normalized.

With multi class cross-entropy reasonable loss values are less than 10 and ideally less than 1.

Your loss is about 7 to 8 orders of magnitude away from that, so I guess that your output is probably also not a probability distribution.

I guess you are trying to run a regression? The first thing you should do then is to not set up your model to do classification.

There are multiple loss functions for regression, the most commonly used one is MSE (i.e. mean square error).

And you should probably also use an IDENTITY activation function to go along with it.

Overall, I suggest you go ahead and try to learn more about how neural networks work and what kinds of inputs they expect.

Deep neural networks can be a bottom-less pit that will eat up your entire time with nothing to show for it if you just try to wing it.

It may seem contradictory, but the best way to save time when learning how to work with neural networks is by reading and understanding the basics first. Then you can try to follow and understand tutorials.

Only when you understand all of the concepts sufficiently well, you can start to build your own networks and network architectures.

If you don’t care about all of that, you can always try to use pre-existing architectures and pre-trained models, or use a different machine learning approach altogether.

I’m sorry, you are right.
To do a quick test i truncated all the train and test sequences to the same length and normalized all of them using MinMaxScaler, then i used MSE loss function and activation IDENTITY for the RNNOutput layer and the NaNs went away (and score started at around 11 for the first iteration and went all the way down to 0.025 after 4 epochs). I have a little academic knowledge of neural networks but in learning DeepLearning4j to me the difficulties are grasping some of the “outline” concepts (ETL for inputs that does not come from CSVs, configuration for evaluation vs predictions ecc…).
Maybe when i’ll become more proficient i’ll use my own experience in the difficulties encountered in learning these concepts and how they apply in the DL4J “language” to submit some step-by-step tutorials for newcomers like me.
Thanks a lot for the help

Feedback on that is always welcome :slight_smile:

I know from first hand experience, that some of those things can be somewhat hard to get at first and we want to improve the documentation, so if you can make a list of the questions you’d like to see addressed, that would be great.

Ideally, you’ll add them here: Documentation Requests