NaNs present in prediction

DarioArena87 · March 22, 2021, 6:19pm

(copied from GitHub NaNs present in predictions INDArray · Issue #8496 · deeplearning4j/deeplearning4j · GitHub)
Hi, i’m learning deeplearnig4j and tried using Groovy 3.0.7 to train a RNN for regression that takes as input series of “composite” values (4 doubles) and i wish to predict the next value (next 4 doubles). I stumbled in this error but i don’t know if i’m doing something wrong or if it is the same bug of this issue:

2021-03-21 18:58:36.479 INFO 7402 — [ 2021-03-21 18:58:38.044 INFO 7402 — [ 2021-03-21 18:58:38.066 INFO 7402 — [ 2021-03-21 18:58:38.066 INFO 7402 — [ 2021-03-21 18:58:38.066 INFO 7402 — [ 2021-03-21 18:58:38.073 INFO 7402 — [ 2021-03-21 18:58:38.074 INFO 7402 — [ 2021-03-21 18:58:38.106 INFO 7402 — [ 2021-03-21 18:58:39.105 INFO 7402 — [ 2021-03-21 18:58:39.984 INFO 7402 — [ 2021-03-21 18:58:40.937 INFO 7402 — [ 2021-03-21 18:58:42.204 INFO 7402 — [ 2021-03-21 18:58:42.550 INFO 7402 — [ 2021-03-21 18:58:43.041 INFO 7402 — [ 2021-03-21 18:58:43.237 INFO 7402 — [ 2021-03-21 18:58:43.421 INFO 7402 — [ 2021-03-21 18:58:43.627 INFO 7402 — [ 2021-03-21 18:58:44.543 INFO 7402 — [ 2021-03-21 18:58:45.455 INFO 7402 — [ 2021-03-21 18:58:46.028 INFO 7402 — [ 2021-03-21 18:58:46.347 INFO 7402 — [ 2021-03-21 18:58:46.678 INFO 7402 — [ 2021-03-21 18:58:47.408 INFO 7402 — [ 2021-03-21 18:58:48.307 INFO 7402 — [ 2021-03-21 18:58:48.915 INFO 7402 — [ 2021-03-21 18:58:49.351 INFO 7402 — [ 2021-03-21 18:58:49.664 INFO 7402 — [ 2021-03-21 18:58:50.572 INFO 7402 — [ 2021-03-21 18:58:51.147 INFO 7402 — [ 2021-03-21 18:58:51.721 INFO 7402 — [ 2021-03-21 18:58:51.991 INFO 7402 — [ 2021-03-21 18:58:52.603 INFO 7402 — [ 2021-03-21 18:58:53.460 INFO 7402 — [ 2021-03-21 18:58:54.149 INFO 7402 — [ 2021-03-21 18:58:54.672 INFO 7402 — [ 2021-03-21 18:58:55.435 INFO 7402 — [ 2021-03-21 18:58:56.050 INFO 7402 — [ 2021-03-21 18:58:56.897 INFO 7402 — [ 2021-03-21 18:58:57.800 INFO 7402 — [ 2021-03-21 18:58:57.970 INFO 7402 — [ 2021-03-21 18:58:57.970 INFO 7402 — [ 2021-03-21 18:58:58.859 INFO 7402 — [ main] org.nd4j.linalg.factory.Nd4jBackend : Loaded [JCublasBackend] backend
main] org.nd4j.nativeblas.NativeOpsHolder : Number of threads used for linear algebra: 32
main] o.n.l.a.o.e.DefaultOpExecutioner : Backend used: [CUDA]; OS: [Linux]
main] o.n.l.a.o.e.DefaultOpExecutioner : Cores: [8]; Memory: [5,9GB];
main] o.n.l.a.o.e.DefaultOpExecutioner : Blas vendor: [CUBLAS]
main] org.nd4j.linalg.jcublas.JCublasBackend : ND4J CUDA build version: 10.2.89
main] org.nd4j.linalg.jcublas.JCublasBackend : CUDA device 0: [GeForce GTX 770]; cc: [3.0]; Total memory: [2095710208]
main] o.d.nn.multilayer.MultiLayerNetwork : Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
main] i.q.b.p.PricePredictionTraining : Network Training
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 0 is 254339.3
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 1 is 534874.65
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 2 is 1822974.8
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 3 is 520241.4
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 4 is 493091.85
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 5 is 146476.6375
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 6 is 239147.6
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 7 is 368370.975
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 8 is 667049.2
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 9 is 1610134.4
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 10 is 5576194.0
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 11 is 461675.9
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 12 is 343023.0
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 13 is 1121107.9
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 14 is 695255.3
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 15 is 938870.2
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 16 is 1045590.5
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 17 is 201749.4375
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 18 is 9143387.2
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 19 is 1.333519872E8
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 20 is 1201941.6
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 21 is 1184513.6
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 22 is 1526626.7
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 23 is 1333823.9
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 24 is 911977.0
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 25 is 238359.65
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 26 is 1787077.6
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 27 is 1045883.3
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 28 is 1289983.6
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 29 is 794034.05
main] o.d.o.listeners.ScoreIterationListener : Score at iteration 30 is 434461.96875
main] o.d.o.listeners.EvaluativeListener : Starting evaluation nr. 1
main] ConditionEvaluationReportLoggingListener :

Caused by: java.lang.IllegalStateException: Cannot perform evaluation with NaNs present in predictions: 2147483647 NaNs present in predictions INDArray
at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:641) ~[nd4j-common-1.0.0-beta7.jar:na]
at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:286) ~[nd4j-common-1.0.0-beta7.jar:na]
at org.nd4j.evaluation.classification.Evaluation.eval(Evaluation.java:403) ~[nd4j-api-1.0.0-beta7.jar:1.0.0-beta7]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluationHelper(MultiLayerNetwork.java:3453) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluation(MultiLayerNetwork.java:3400) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.optimize.listeners.EvaluativeListener.invokeListener(EvaluativeListener.java:236) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.optimize.listeners.EvaluativeListener.onEpochEnd(EvaluativeListener.java:213) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1727) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1636) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1623) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork$fit.call(Unknown Source) ~[na:na]
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) ~[groovy-3.0.7.jar:3.0.7]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) ~[groovy-3.0.7.jar:3.0.7]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148) ~[groovy-3.0.7.jar:3.0.7]

I’ve built a simple SequenceRecordReaderDataSetIterator

def trainData = new SequenceRecordReaderDataSetIterator(
            new InMemorySequenceRecordReader(sequences),
            5,
            4,
            4,
            true
        )

where sequences is a list of all the timeseries used for training (each record is composed of 8 values, 4 for the “current” timestep and the last 4 are the “next” timestep value)

and i build the network like this

def conf = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.XAVIER)
            .updater(new Nadam())
            .list()
            .layer(
                new LSTM.Builder()
                    .activation(Activation.TANH)
                    .nIn(4).nOut(50)
                    .build()
            )
            .layer(
                new RnnOutputLayer.Builder()
                    .lossFunction(LossFunctions.LossFunction.MCXENT)
                    .activation(Activation.SOFTMAX)
                    .nIn(50).nOut(4)
                    .build()
            )
            .build()

def net = new MultiLayerNetwork(conf)
net.init()

net.setListeners(new ScoreIterationListener(1), new EvaluativeListener(testData, 1, InvocationType.EPOCH_END))

net.fit(trainData, 4)

Am i doing something wrong? Is there a workaround for this until the bug is officially resolved?

I’m using gradle for the build and theese are my dependencies
implementation ‘org.codehaus.groovy:groovy’
implementation ‘org.deeplearning4j:deeplearning4j-core:1.0.0-beta7’
implementation ‘org.deeplearning4j:deeplearning4j-cuda-10.2:1.0.0-beta7’
implementation ‘org.nd4j:nd4j-cuda-10.2-platform:1.0.0-beta7’
DL

treo · March 22, 2021, 6:48pm

The error tells you pretty much what your problem is: You’ve got NaN predictions.

Looking at the scores you are getting during training, I guess your inputs are probably not normalized.

With multi class cross-entropy reasonable loss values are less than 10 and ideally less than 1.

Your loss is about 7 to 8 orders of magnitude away from that, so I guess that your output is probably also not a probability distribution.

I guess you are trying to run a regression? The first thing you should do then is to not set up your model to do classification.

There are multiple loss functions for regression, the most commonly used one is MSE (i.e. mean square error).

And you should probably also use an IDENTITY activation function to go along with it.

Overall, I suggest you go ahead and try to learn more about how neural networks work and what kinds of inputs they expect.

Deep neural networks can be a bottom-less pit that will eat up your entire time with nothing to show for it if you just try to wing it.

It may seem contradictory, but the best way to save time when learning how to work with neural networks is by reading and understanding the basics first. Then you can try to follow and understand tutorials.

Only when you understand all of the concepts sufficiently well, you can start to build your own networks and network architectures.

If you don’t care about all of that, you can always try to use pre-existing architectures and pre-trained models, or use a different machine learning approach altogether.

DarioArena87 · March 23, 2021, 12:23pm

I’m sorry, you are right.
To do a quick test i truncated all the train and test sequences to the same length and normalized all of them using MinMaxScaler, then i used MSE loss function and activation IDENTITY for the RNNOutput layer and the NaNs went away (and score started at around 11 for the first iteration and went all the way down to 0.025 after 4 epochs). I have a little academic knowledge of neural networks but in learning DeepLearning4j to me the difficulties are grasping some of the “outline” concepts (ETL for inputs that does not come from CSVs, configuration for evaluation vs predictions ecc…).
Maybe when i’ll become more proficient i’ll use my own experience in the difficulties encountered in learning these concepts and how they apply in the DL4J “language” to submit some step-by-step tutorials for newcomers like me.
Thanks a lot for the help

treo · March 23, 2021, 12:30pm

Feedback on that is always welcome

I know from first hand experience, that some of those things can be somewhat hard to get at first and we want to improve the documentation, so if you can make a list of the questions you’d like to see addressed, that would be great.

Ideally, you’ll add them here: Documentation Requests

Topic		Replies	Views
Simple CNN predicts NaNs DL4J	3	503	November 2, 2020
NaN on arm server DL4J	20	1467	December 28, 2020
Arbiter: Cannot perform evaluation with NaNs present in predictions DL4J	3	532	November 22, 2020
DL4J Need help with my input data DL4J	5	460	December 25, 2020
Basic deeplearning4j classification example DL4J	4	1001	February 3, 2020

NaNs present in prediction

Related topics