LSTM Regression Example

andrew-selvia · January 10, 2022, 3:01am

Hello, I’ve read through the DL4J docs and examples in search of an example using LSTMs for regression. Not seeing a perfect match, I’ve attempted to put the pieces together myself, but I’ve come up short so far. Could anyone point me to an example?

My dataset is just a list of integers (i.e. [16, 15, 32, …]) stored in a file (/input.txt). I’d like train the network to predict the next number given 12 preceding time-series inputs. I’ve copied my Scala code below in case it can reveal anything silly I should change.

val recordReader = new LineRecordReader
    recordReader.initialize(
      new FileSplit(new File("/input.txt")))
val batchSize = 10
val regression = true
val dataSetIterator = new RecordReaderDataSetIterator(recordReader, batchSize, 0, 0, regression)
val hiddenLayerWidth = 10
val net = new MultiLayerNetwork(
  new NeuralNetConfiguration.Builder()
    .optimizationAlgo(STOCHASTIC_GRADIENT_DESCENT)
    .seed(123)
    .weightInit(XAVIER)
    .list
    .layer(
      new LSTM.Builder()
        .activation(TANH)
        .nIn(1)
        .nOut(hiddenLayerWidth)
        .build)
    .layer(
      new RnnOutputLayer.Builder(MSE)
        .activation(IDENTITY)
        .nIn(hiddenLayerWidth)
        .nOut(1)
        .build()
    )
    .build)
net.init()
logger.info("Starting training...")
net.setListeners(new ScoreIterationListener(1))
val nEpochs = 40
(0 to nEpochs).foreach(_ => net.fit(dataSetIterator))

andrew-selvia · January 11, 2022, 6:38am

Since I forgot to mention, a Java-based example is fine as well. I’m mainly attempting to build intuition for composing the API pieces. I’ve already got DL4J examples running, including a LSTM classification one. I’ve got to be close…

Thanks in advance!

treo · January 11, 2022, 7:33am

To confirm I understand the question right:

Your problem is that you’ve got a sequence of inputs in a single file. You want to use a window (=12 steps in your case) of that data to do auto-regression (e.g. predict the next input)?

So your initial problem is with setting up the SequenceRecordReader.

In the code you’ve provided you are creating a regular RecordReader which will turn your your file into individual examples.

You probably want to read it as a single sequence instead, and then apply a Transform with a windowing function on it to create the actual examples for your model.

With those keywords you should have enough to find the appropriate way of doing it in the examples and documentation.

If not, come back and give us an update on what you tried. But please don’t do a “I’ve tried nothing and I’m all out of ideas”.

Then you’ve got an additional problem you don’t know about yet:

You don’t have a Normalizer set up. This means your data is going to be presented to the network in the same way as it is in your file.

The problem with that is your network will struggle to learn anything from it, because your activation function is only really sensitive in the range from -1 to 1. That means when it gets anything outside this particular range it is going to be saturated, which in turn means that only very little learning signal (= small gradient) will be applied to your weights.

Normalizing your data will require that you know the range of your data, so it can be transformed into the -1 to 1 range and then you’ll need to also de-normalize it if you really want the initial input range for prediction.

I think that particular use case should also be covered somewhere in the examples. Most likely in the data-pipeline examples.

andrew-selvia · January 11, 2022, 7:42am

I just stumbled upon the SequenceAnomalyDetection example which may be what I’m looking for…

andrew-selvia · January 11, 2022, 7:45am

Thanks for the detailed suggestions @treo! I will report back after I run some more experiments based on your suggestions and that SequenceAnomalyDetection example.

andrew-selvia · January 11, 2022, 8:51am

@treo To answer your question from earlier: Correct, I have 1 file with 100 numbers (1 per line). Given any 12 sequential numbers, I’d like to predict the 13th (the windowing transform seems perfect). I’m starting with that since it seems like the simplest version of my real problem.

I have switched back to CSVLineSequenceRecordReader, though I’m still trying to understand the RuntimeException being thrown during net.fit by the “AsyncDataSetIterator prefetch thread”:

IllegalArgumentException: Length must be >= 1

This was also thrown with LineRecordReader (though at least with that I could confirm the file contents were being read into memory properly), so I must be missing something fundamental.

I clearly see the value in the windowing you brought up. Are there docs/examples demonstrating that? My search through the docs/examples for “window” yielded mostly Swing and Microsoft Windows references.

I am skipping normalization temporarily until I can at least kick off the training step, though I recognize it will need to be done before I can achieve proper results.

andrew-selvia · January 11, 2022, 8:58am

I might need to also use SequenceRecordReaderDataSetIterator… Will test that out tomorrow. Sorry, have to go to sleep.

treo · January 11, 2022, 9:01am

All of the data loading doesn’t happen until you actually start reading from it. And when you call fit on your model, it will start doing things just then.

I don’t have the conditions memorized when that particular exception is thrown. Can you share more of the stacktrace?

My best guess is that this is the source of your problem. Because this results in a zero length range for your input.

The easiest way to get around it would be to produce your demo data in a way that has two columns, one for the input and one for the output.

If you want to stick with the data format you have, you can use a transform process to turn it into a two column data structure.

For more on transforms, maybe take a look at my dl4j quickstart guide: Quickstart with Deeplearning4J – dubs·tech

andrew-selvia · January 11, 2022, 9:11am

Agreed. I’ll try again tomorrow.

I’m not committed to any data format yet. Perhaps, based on some of the docs I’ve read, I should have an input CSV file with 12 comma-separated numbers (features) per line. Then, an output CSV file with 1 column containing just the labels. Certainly open to your guidance.

Thanks for linking me to the QuickStart guide! I hadn’t come across it yet.

Have a good rest of your day!

andrew-selvia · January 12, 2022, 7:42am

January 12 Update

I experimented with CSVVariableSlidingWindowRecordReader and was able to complete a training round (after some data format changes), but ultimately didn’t think it was the correct choice since it iterates up to a maximum window size and I expect a fixed window size of 12.

I have switched to using CSVSequenceRecordReader similar to the UCISequenceClassification example. I also changed my data format such that custom/train/features/0.csv contains the integers 0 to 11, each on a new line. The label associated with that sequence (12) is stored in custom/train/labels/0.csv. I hope this new data format works for regression problems, though I can’t say for sure yet. If you know better @treo, I’d welcome your feedback. I was hoping to have a single file with a moving window to avoid duplicating time series segments across multiple files (i.e. 0 to 11 => 12 and 1 to 12 => 13 both contain the segment 1 to 12), but I just couldn’t figure it out, so I’m falling back to this more verbose data format.

val baseDir = new File("src/main/resources/custom/")
val baseTrainDir = new File(baseDir, "train")
val featuresDirTrain = new File(baseTrainDir, "features")
val labelsDirTrain = new File(baseTrainDir, "labels")
val trainFeatures = new CSVSequenceRecordReader
trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath + "/%d.csv", 0, 1))
val trainLabels = new CSVSequenceRecordReader
trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath + "/%d.csv", 0, 1))

val miniBatchSize = 1
val regression = true
val numPossibleLabels = -1
val labelIndex = 0
val dataSetIterator =
  new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, miniBatchSize, numPossibleLabels, regression)

while (dataSetIterator.hasNext)
  println(dataSetIterator.next)
dataSetIterator.reset()

The dataSetIterator seems to have the expected contents:

===========INPUT===================
[[[         0,    1.0000,    2.0000,    3.0000,    4.0000,    5.0000,    6.0000,    7.0000,    8.0000,    9.0000,   10.0000,   11.0000]]]
=================OUTPUT==================
[[[12.0000]]]

The new exception I’m hunting down occurs during training:

val hiddenLayerWidth = 30
val net = new MultiLayerNetwork(
  new NeuralNetConfiguration.Builder()
    .optimizationAlgo(STOCHASTIC_GRADIENT_DESCENT)
    .seed(123)
    .weightInit(XAVIER)
    .list
    .layer(
      new LSTM.Builder()
        .activation(TANH)
        .nIn(1)
        .nOut(hiddenLayerWidth)
        .build)
    .layer(
      new RnnOutputLayer.Builder(MSE)
        .activation(IDENTITY)
        .nIn(hiddenLayerWidth)
        .nOut(1)
        .lossFunction(MSE)
        .build()
    )
    .build)
net.init()
logger.info("Starting training...")
net.setListeners(new ScoreIterationListener(1))
net.fit(dataSetIterator, 50)

The exception is:

Sequence lengths do not match for RnnOutputLayer input and labels:Arrays should be rank 3 with shape [minibatch, size, sequenceLength] - mismatch on dimension 2 (sequence length) - input=[1, 30, 12] vs. label=[1, 1, 1]

The error message is fairly clear, though I don’t know how to resolve it yet. Help would be welcomed. I’ll come back again tomorrow with another status update.

treo · January 12, 2022, 8:06am

The problem with your current approach is that you have defined a network that expects an output at every step, but you are providing only a single label.

If you take a close look at the example you mentioned, you’ll see that it also passes an alignment indicator to the data set iterator.

By aligning your label to the end of the sequence you tell the data set iterator that you want it to create masked empty outputs for all those that are missing.

If I remember correctly, another way of dealing with this, is that you can also use a regular OutputLayer instead of an RnnOutputLayer. In that case it should go through the sequence and then only use the last output from the LSTM layer.

andrew-selvia · January 13, 2022, 6:51am

Thanks @treo! Using ALIGN_END solved that problem. I feel so relieved to have the network at least train now.

I added in testing/evaluation tonight. I’ll keep iterating tomorrow.

Topic		Replies	Views
Regression Problem DL4J	9	401	July 15, 2022
Training data shapping DL4J	4	716	June 5, 2021
DL4J Need help with my input data DL4J	5	462	December 25, 2020
Question regarding the LSTM training and data format DL4J	3	554	April 28, 2021
How to adapt the LSTM neural network example with sea temperature data to a different dataset? DL4J	3	295	March 20, 2023

LSTM Regression Example

January 12 Update

Related topics