(copied from GitHub NaNs present in predictions INDArray · Issue #8496 · deeplearning4j/deeplearning4j · GitHub)
Hi, i’m learning deeplearnig4j and tried using Groovy 3.0.7 to train a RNN for regression that takes as input series of “composite” values (4 doubles) and i wish to predict the next value (next 4 doubles). I stumbled in this error but i don’t know if i’m doing something wrong or if it is the same bug of this issue:
2021-03-21 18:58:36.479 INFO 7402 — [ main] org.nd4j.linalg.factory.Nd4jBackend : Loaded [JCublasBackend] backend
2021-03-21 18:58:38.044 INFO 7402 — [ main] org.nd4j.nativeblas.NativeOpsHolder : Number of threads used for linear algebra: 32
2021-03-21 18:58:38.066 INFO 7402 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Backend used: [CUDA]; OS: [Linux]
2021-03-21 18:58:38.066 INFO 7402 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Cores: [8]; Memory: [5,9GB];
2021-03-21 18:58:38.066 INFO 7402 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Blas vendor: [CUBLAS]
2021-03-21 18:58:38.073 INFO 7402 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : ND4J CUDA build version: 10.2.89
2021-03-21 18:58:38.074 INFO 7402 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : CUDA device 0: [GeForce GTX 770]; cc: [3.0]; Total memory: [2095710208]
2021-03-21 18:58:38.106 INFO 7402 — [ main] o.d.nn.multilayer.MultiLayerNetwork : Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
2021-03-21 18:58:39.105 INFO 7402 — [ main] i.q.b.p.PricePredictionTraining : Network Training
2021-03-21 18:58:39.984 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 0 is 254339.3
2021-03-21 18:58:40.937 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 1 is 534874.65
2021-03-21 18:58:42.204 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 2 is 1822974.8
2021-03-21 18:58:42.550 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 3 is 520241.4
2021-03-21 18:58:43.041 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 4 is 493091.85
2021-03-21 18:58:43.237 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 5 is 146476.6375
2021-03-21 18:58:43.421 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 6 is 239147.6
2021-03-21 18:58:43.627 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 7 is 368370.975
2021-03-21 18:58:44.543 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 8 is 667049.2
2021-03-21 18:58:45.455 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 9 is 1610134.4
2021-03-21 18:58:46.028 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 10 is 5576194.0
2021-03-21 18:58:46.347 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 11 is 461675.9
2021-03-21 18:58:46.678 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 12 is 343023.0
2021-03-21 18:58:47.408 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 13 is 1121107.9
2021-03-21 18:58:48.307 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 14 is 695255.3
2021-03-21 18:58:48.915 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 15 is 938870.2
2021-03-21 18:58:49.351 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 16 is 1045590.5
2021-03-21 18:58:49.664 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 17 is 201749.4375
2021-03-21 18:58:50.572 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 18 is 9143387.2
2021-03-21 18:58:51.147 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 19 is 1.333519872E8
2021-03-21 18:58:51.721 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 20 is 1201941.6
2021-03-21 18:58:51.991 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 21 is 1184513.6
2021-03-21 18:58:52.603 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 22 is 1526626.7
2021-03-21 18:58:53.460 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 23 is 1333823.9
2021-03-21 18:58:54.149 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 24 is 911977.0
2021-03-21 18:58:54.672 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 25 is 238359.65
2021-03-21 18:58:55.435 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 26 is 1787077.6
2021-03-21 18:58:56.050 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 27 is 1045883.3
2021-03-21 18:58:56.897 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 28 is 1289983.6
2021-03-21 18:58:57.800 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 29 is 794034.05
2021-03-21 18:58:57.970 INFO 7402 — [ main] o.d.o.listeners.ScoreIterationListener : Score at iteration 30 is 434461.96875
2021-03-21 18:58:57.970 INFO 7402 — [ main] o.d.o.listeners.EvaluativeListener : Starting evaluation nr. 1
2021-03-21 18:58:58.859 INFO 7402 — [ main] ConditionEvaluationReportLoggingListener :
Caused by: java.lang.IllegalStateException: Cannot perform evaluation with NaNs present in predictions: 2147483647 NaNs present in predictions INDArray
at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:641) ~[nd4j-common-1.0.0-beta7.jar:na]
at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:286) ~[nd4j-common-1.0.0-beta7.jar:na]
at org.nd4j.evaluation.classification.Evaluation.eval(Evaluation.java:403) ~[nd4j-api-1.0.0-beta7.jar:1.0.0-beta7]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluationHelper(MultiLayerNetwork.java:3453) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.doEvaluation(MultiLayerNetwork.java:3400) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.optimize.listeners.EvaluativeListener.invokeListener(EvaluativeListener.java:236) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.optimize.listeners.EvaluativeListener.onEpochEnd(EvaluativeListener.java:213) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1727) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1636) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1623) ~[deeplearning4j-nn-1.0.0-beta7.jar:na]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork$fit.call(Unknown Source) ~[na:na]
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47) ~[groovy-3.0.7.jar:3.0.7]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125) ~[groovy-3.0.7.jar:3.0.7]
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:148) ~[groovy-3.0.7.jar:3.0.7]
I’ve built a simple SequenceRecordReaderDataSetIterator
def trainData = new SequenceRecordReaderDataSetIterator(
new InMemorySequenceRecordReader(sequences),
5,
4,
4,
true
)
where sequences is a list of all the timeseries used for training (each record is composed of 8 values, 4 for the “current” timestep and the last 4 are the “next” timestep value)
and i build the network like this
def conf = new NeuralNetConfiguration.Builder()
.weightInit(WeightInit.XAVIER)
.updater(new Nadam())
.list()
.layer(
new LSTM.Builder()
.activation(Activation.TANH)
.nIn(4).nOut(50)
.build()
)
.layer(
new RnnOutputLayer.Builder()
.lossFunction(LossFunctions.LossFunction.MCXENT)
.activation(Activation.SOFTMAX)
.nIn(50).nOut(4)
.build()
)
.build()
def net = new MultiLayerNetwork(conf)
net.init()
net.setListeners(new ScoreIterationListener(1), new EvaluativeListener(testData, 1, InvocationType.EPOCH_END))
net.fit(trainData, 4)
Am i doing something wrong? Is there a workaround for this until the bug is officially resolved?
I’m using gradle for the build and theese are my dependencies
implementation ‘org.codehaus.groovy:groovy’
implementation ‘org.deeplearning4j:deeplearning4j-core:1.0.0-beta7’
implementation ‘org.deeplearning4j:deeplearning4j-cuda-10.2:1.0.0-beta7’
implementation ‘org.nd4j:nd4j-cuda-10.2-platform:1.0.0-beta7’
DL