Problems with understanding the topic - RNN for predicting values

Hello community,

I’m trying (for educational purposes only) to compare prediction accuracy between ARiMA model and LSTM network for short-term market stock prediction. ARiMA is based on the idea that past values of the time series can alone be used to predict the future values (This part I’ve already created and got some results) – and I want to recreate similar situation with neural network.

Data is in usual format that stock market data have e.g.:

Date, Open, Close, Low, High, Volume
1991-04-16,100,100,100,100,325
1991-04-23,95.7,95.7,95.7,95.7,5905
1991-04-30,93.5,93.5,93.5,93.5,7162

I have removed date since it was resulting in exception with parsing data (maybe I don’t have to do so?).

So here is my problem with understanding the subject – how do I prepare the neural network? It’s not a classification issue (unfortunately, those have a lot of examples). Main problem is that I don’t know what the labels should be – because as far I understand I don’t have actual labels – every value is a feature (so I have 5 features – open, close, low, high and volume). Is data from 1991-04-23 label for values from 1991-04-16? I have started making some implementation but I don’t know what should be put as numPossibleLabels and labelIndex inside the constructor of iterator(so i just put 0 there). As I understand from reading the docs the result of invoking method feedForward should give me a an array with prediction (probably normalized, since I was using normalization on iterator) – but the question is for as many days - as many as batchSize is set? Or as many as bptt have in range? And how then I can take the actual predicted value from it? Here is the sample of code that I’ve started working on.

    //read the data
    SequenceRecordReader reader = new CSVSequenceRecordReader(1, ";");
    reader.initialize(new FileSplit(new File("wig_no_date.csv")));

    DataSetIterator iterator = new SequenceRecordReaderDataSetIterator(reader, properties.getBatchSize(),
            0, 0, true);
    MultiLayerNetwork net = RecurrentNetwork.buildLstmNetworks(iterator.inputColumns(), iterator.totalOutcomes());
    //normalize it to range 0-1
    NormalizerMinMaxScaler minMaxScaler = new NormalizerMinMaxScaler();
    minMaxScaler.fit(iterator);
    iterator.setPreProcessor(minMaxScaler);

    for (int i = 0; i < properties.getEpochs(); i++) {
        while (iterator.hasNext()) {
            net.fit(iterator.next());
        }
        iterator.reset();
        net.rnnClearPreviousState();
    }

    List<INDArray> prediction = net.feedForward();

and the config for network:

 public static MultiLayerNetwork buildLstmNetworks(int nIn, int nOut) {
    MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .seed(seed)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .weightInit(WeightInit.XAVIER)
            .updater(Updater.RMSPROP)
            .l2(1e-4)
            .list()
            .layer(0, new LSTM.Builder()
                    .nIn(nIn)
                    .nOut(lstmLayer1Size)
                    .activation(Activation.TANH)
                    .gateActivationFunction(Activation.HARDSIGMOID)
                    .dropOut(dropoutRatio)
                    .build())
            .layer(1, new LSTM.Builder()
                    .nIn(lstmLayer1Size)
                    .nOut(lstmLayer2Size)
                    .activation(Activation.TANH)
                    .gateActivationFunction(Activation.HARDSIGMOID)
                    .dropOut(dropoutRatio)
                    .build())
            .layer(2, new DenseLayer.Builder()
                    .nIn(lstmLayer2Size)
                    .nOut(denseLayerSize)
                    .activation(Activation.RELU)
                    .build())
            .layer(3, new RnnOutputLayer.Builder()
                    .nIn(denseLayerSize)
                    .nOut(nOut)
                    .activation(Activation.IDENTITY)
                    .lossFunction(LossFunctions.LossFunction.MSE)
                    .build())
            .backpropType(BackpropType.TruncatedBPTT)
            .tBPTTLength(truncatedBPTTLength)
            .build();

    MultiLayerNetwork net = new MultiLayerNetwork(conf);
    net.init();
    net.setListeners(new ScoreIterationListener(100));
    return net;
}

@tieburach what you’d be looking for is regression:

Here’s our docs on rnns:
https://deeplearning4j.konduit.ai/models/recurrent

Of note here is normalized labels.Make sure to call minMaxScaler.fitLabel(true)

We have a few small regression samples for you to look at as well:

To your specific question about how the length is setup, generally batch size is the number of time series elements you want to process at a time.

Your number of days is the time series length.

As for how many…there’s no hard rules there. Figure out what works. I would say generally predicting anything long range is going to be fraught with failure. Try for short intervals and test for success there. Expect it to fail.

As for the expected result, you just need to use net.output(…) here. THat gives you the final output ndarray.

Remember to call revert labels on the neural net output to get your outcomes as real numbers back.