Is LSTM working in DeepLearning4J?

MonNomIzNoGoud · May 3, 2026, 6:46pm

Hello. I was looking for a java NN framework in order to more smooth than predict one time series values based on a dozen or so other time series. I’ve stumbled on SANNET which looked promising but unfortunately the GRU/LSTM are badly coded because there was no way to use them with time series (the example by the author was creating a layer for each date).

I decided to try out DLJ4, but after a week of struggling against the different versions and documentation it seems to me that the LSTM isn’t working also. There are too few examples, most of them written in older versions and based on these I tried a few approaches and still coildn’t make it work.

A basic approach was: I create the inputs in the form of inputData = new double[5000][10][20] (5000 samples - you call them examples ! - 10 time series and a window of 20 dates). The output should have the form: outputData = double[5000][1] becasue I only have one output. So

double[][][] inputData = createInputData(inputs, window);
double[][] outputData = createOutputData(outputs, window);
List list = new ArrayList(inputData.length);
for (int i=0; i<inputData.length; i++) list.add(new Pair(inputData[i], outputData[i]));
INDArrayDataSetIterator trainIter = new INDArrayDataSetIterator(list, 500);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().list().layer(0, new LSTM.Builder().nIn(inputs.length).nOut(50).activation(Activation.TANH).build()).layer(1, new RnnOutputLayer.Builder().nIn(50).nOut(outputs.length)
    .activation(Activation.LEAKYRELU).lossFunction(LossFunctions.LossFunction.MSE).build()).build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
int epochs = 1000;
for (int i=0; i<epochs; i++)
{
  model.fit(trainIter);
  if (i % 100 == 0) System.out.println(model.output(Nd4j.create(inputData)));
  trainIter.reset();
}

But nothing happens. I have the same output, and the training takes a fraction of a second. Somthing is wrong.

I tried another approach:

CollectionSequenceRecordReader inputData = createInputList(inputs, window);
CollectionSequenceRecordReader outputData = createOutputList(outputs, window);
SequenceRecordReaderDataSetIterator trainIter = new SequenceRecordReaderDataSetIterator(inputData, outputData, 500, 1, true);

With inputData and outputData having the same logic as before (Many-to-One you call it), and I get an arror message. He didn’t like the Many-To-One approach, output should have have a history window of 20, which is not what LSTM should do:

Exception in thread “main” java.lang.IllegalStateException: Sequence lengths do not match for RnnOutputLayer input and labels:Arrays should be rank 3 with shape [minibatch, size, sequenceLength] - mismatch on dimension 2 (sequence length) - input=[500, 50, 20] vs. label=[500, 1, 1]
at org.nd4j.common.base.Preconditions.throwStateEx(Preconditions.java:639)
at org.nd4j.common.base.Preconditions.checkState(Preconditions.java:337)
at org.deeplearning4j.nn.layers.recurrent.RnnOutputLayer.backpropGradient(RnnOutputLayer.java:59)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.calcBackpropGradients(MultiLayerNetwork.java:1984)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2799)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.computeGradientAndScore(MultiLayerNetwork.java:2742)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:174)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:61)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fitHelper(MultiLayerNetwork.java:1753)
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.fit(MultiLayerNetwork.java:1674)
at tests.DL4JTests.testLSTM(DL4JTests.java:117)

What have I done wrong ?

agibsonccc · May 4, 2026, 8:45am

@MonNomIzNoGoud could you outline some things?

For example, I’ve seen some people come by and will say they’ve were using a version from 7-10 years ago for whatever reason due to out of date tutorials.
Did you try with the most recent examples using M2.1?

That reflects the current api usage. Going forward that api is in maintenance mode and will only be used for keras import with the expectation that users move to samediff afterwards. Could you just show your example in full?

That being said, that SHOULD still work.

MonNomIzNoGoud · May 4, 2026, 8:41pm

Hi agibsonccc, thank you for responding. I admit that I didn’t quite understand what you just said: DLJ4 will become obsolete and replaced by SameDiff ? I’m new to all this and I don’t want to dive into something that wouldn’t be maintained anymore.

Going back to my post I had a hard time getting the LSTM example just working. I don’t know about the results yet but I managed to get it working with the help of… AI Although the answers of the different engines (Claude, Gemini and ChatGPT) had many flaws I finally found that ChatGPT’s answer was more robust.

What bothered me was that there was no true Many-To-One solution, the outputs (labels) will not have one value but an array of “sequenceLength” values with only one relevant: the last, the others are just a waste of resources, I think. Anyway I’ve tried to use the LastTimeStep wrapper at first, but this was not a viable solution because I couldn’t use RnnOutputLayer with back propagation, and maybe later with a convolution layer, so I settled for the oputput mask that would also allow to compute the scores only for the last point. So the challange was to get the input/output arrays right. I’ll put my code at the end of the post in order to help other newbies looking for such a solution.

I want to test LSTM but also MLP/CNN/TCNN models for the same purpose. It’s shame you don’t have a GRU layer, I read somewhere that you didn’ find that it has better results that LSTM, but I also read that GRU layers uses less resources than LSTM layers.

The code

double[][][] inputData = createInputData3D(inputs, window); //double[5000][10][20]
double[][][] outputData = createOutputData3D(outputs, window); //double[5000][1][20]
INDArray inputArr = Nd4j.create(inputData);
INDArray outputArr = Nd4j.create(outputData);
DataSet fullDataSet = new DataSet(inputArr, outputArr);
int samplesNb = outputs[0].size();
INDArray labelMask = Nd4j.zeros(samplesNb, window);
for (int i=0,last=window-1; i<samplesNb; i++) labelMask.putScalar(i, last, 1.0);  // last time step
fullDataSet.setLabelsMaskArray(labelMask);
DataSetIterator trainIter = new ViewIterator(fullDataSet, batchSize);
DataNormalization normaliser = new NormalizerMinMaxScaler(-1, 1);
normaliser.fitLabel(true);
normaliser.fit(trainIter);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder().miniBatch(true).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).updater(new Adam(0.0001)).list()
    .layer(0, new LSTM.Builder().nIn(inputs.length).nOut(50).weightInit(WeightInit.XAVIER).activation(Activation.TANH).build())
    .layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE).weightInit(WeightInit.XAVIER).activation(Activation.IDENTITY).nIn(50).nOut(outputs.length).build())
    .backpropType(BackpropType.TruncatedBPTT).tBPTTForwardLength(batchSize).tBPTTBackwardLength(batchSize).build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();

MonNomIzNoGoud · May 5, 2026, 6:13am

I forgot to answer your question about the version: I used the latest version for Java 8, M2 I think. I also stumbled into a serious bug. I know that using native DLLs/libs to accelerate the calculations and use the GPU is a good thing but these libraries should be secured. Coding in C/C++ and using pointers, allocations/deallocations, etc. could be dangerous. For example I had a JVM crash when I used the following code:

INDArray labels = Nd4j.zeros(batchSize, outputs.length, sequenceLength);
labels.put(new INDArrayIndex[] { NDArrayIndex.all(), NDArrayIndex.all(), NDArrayIndex.point(sequenceLength - 1) }, outputArr);

libnd4jcpu.dll crashed the JVM witth the following stack trace:

J 1714 org.nd4j.linalg.cpu.nativecpu.bindings.Nd4jCpu.execTransformAny(Lorg/bytedeco/javacpp/PointerPointer;ILorg/nd4j/nativeblas/OpaqueDataBuffer;Lorg/bytedeco/javacpp/LongPointer;Lorg/bytedeco/javacpp/LongPointer;Lorg/nd4j/nativeblas/OpaqueDataBuffer;Lorg/bytedeco/javacpp/LongPointer;Lorg/bytedeco/javacpp/LongPointer;Lorg/bytedeco/javacpp/Pointer;)V (0 bytes) @ 0x00000000038ac57e [0x00000000038ac480+0xfe]
J 1703 C1 org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(Lorg/nd4j/linalg/api/ops/TransformOp;Lorg/nd4j/linalg/api/ops/OpContext;)V (1340 bytes) @ 0x00000000038c8ac4 [0x00000000038bc000+0xcac4]
J 1700 C1 org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(Lorg/nd4j/linalg/api/ops/Op;Lorg/nd4j/linalg/api/ops/OpContext;)Lorg/nd4j/linalg/api/ndarray/INDArray; (166 bytes) @ 0x00000000038adb64 [0x00000000038aca00+0x1164]
J 1699 C1 org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(Lorg/nd4j/linalg/api/ops/Op;)Lorg/nd4j/linalg/api/ndarray/INDArray; (7 bytes) @ 0x00000000038a0d44 [0x00000000038a0cc0+0x84]
j org.nd4j.linalg.api.ndarray.BaseNDArray.assign(Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+12
j org.nd4j.linalg.api.ndarray.BaseNDArray.put([Lorg/nd4j/linalg/indexing/INDArrayIndex;Lorg/nd4j/linalg/api/ndarray/INDArray;)Lorg/nd4j/linalg/api/ndarray/INDArray;+63
j tests.DL4JTests.testLSTM()V+364
j tests.DL4JTests.main([Ljava/lang/String;)V+0
v ~StubRoutines::call_stub

agibsonccc · May 8, 2026, 8:16pm

@MonNomIzNoGoud sorry what I am specifically talking about is the api you are using is not going to be updated anymore. It’s not really flexible to do much and is not capable of doing many things people do with machine learning today.

Browse the examples or since you use gpt have it break down what I am explaining to you.

At least try M2.1 and see if that suits your needs.

Beyond that, yes: that is unfortunately the price you pay for performance.

You keep mentioning all these things we don’t have and I’m going to mention it again: we have the new api for a reason. I am not adding new layers to the old dl4j. These are the examples:

Don’t follow anything else. Use the newest version first if that doesn’t work for you I can help you try to use snapshots which has a more recent builds. Unfortunately due to the rewrite of the past few years I haven’t been able to release much till now.

@ my user name next time if you want a faster response. since the forums don’t get much traffic right now I only check in every few days or when I see emails as I get time.

Thanks!

MonNomIzNoGoud · May 16, 2026, 3:31pm

@agibsonccc No worries, I’m practically new to all of this, even to actively be in forums. When you’re 55 you shouldn’t be learning new stuff

Regarding the version: I’m using M2, not M2.1 because it has no java 1.8 resources. I’m stuck to old java, because I’m stuck to Eclipse Luna, because it handles better javascript projects. And Oracle making using new java version non-free bothers me.

I don’t think I’m going to dive into SameDiff, this seems to be a low level API to do much advanced neural network computations. The higher level API suits me fine for now. Unless you’re saying the higher level API (MultiLayerConfiguration, etc.) is also getting replaced.

I’m trying to smooth market data prices using NN. Normally you can’t beat old EWMA smoothing even though I’ve tried everything there is on the net, even the “magic” smoothers some people sell, but combining EWMA+Holt-Winters beat them all. So I’m trying to beat it with NN.

Topic		Replies	Views
Problem removing lstm, shape exception RL4J	5	392	July 3, 2022
Implementing Keras LSTM Feature using DL4J DL4J	0	173	October 18, 2023
Bidirectional LSTM in DL4J based on python example DL4J	11	1731	May 21, 2020
Imported Keras LSTM layer mismatch DL4J	18	1567	February 14, 2020
Clinical Time Series LSTM	2	485	June 5, 2021

Is LSTM working in DeepLearning4J?

Related topics