Training data shapping

I am new to deep learning in general. I am trying to predict a sequence. My data is shaped as:

      unix    |         change          |       low_margin       |      high_margin       
1596322800000 |  0.12413154238073182029 | 0.88003705419175544233 | 0.15312164612708634025
1596326400000 | -0.27751567963589942832 | 0.60017439379603346877 | 0.46807644631921703576
1596330000000 |      1.7769698768363259 | 0.09413488648167383885 | 0.23692363768908328777
1596333600000 | -0.45876993166287015945 | 0.68331708567322522918 | 0.16993166287015945330

I would like to build a net to predict the next (change, and if possible high_mrg , low_mrgn).

The data is coming from SQL DB, using

JDBCRecordReader reader = new JDBCRecordReader(sql, ds);
reader.initialize(null);
RecordReaderDataSetIterator trainIter = new RecordReaderDataSetIterator(reader, BATCH_SIZE, 1, 1, true);

The model I am starting with:

	int lstmLayerSize = 256;

	LSTM lstmLayer1 = new LSTM.Builder()//
		.nIn(lstmLayerSize)
		.nOut(lstmLayerSize)
		.activation(Activation.TANH)
		.gateActivationFunction(Activation.HARDSIGMOID)
		.dropOut(dropoutRatio)
		.build();

	DenseLayer denseLayer = new DenseLayer.Builder()//
		.nIn(lstmLayerSize)
		.nOut(lstmLayerSize)
		.activation(Activation.RELU)
		.build();

	RnnOutputLayer rnnOutputLayer = new RnnOutputLayer.Builder(LossFunction.MSE).activation(Activation.IDENTITY)
		.nIn(200)
		.nOut(52)
		.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
		.gradientNormalizationThreshold(10)
		.build();

	MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()//
		.seed(seed)
		.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
		.weightInit(WeightInit.XAVIER)
		.updater(new Nesterovs(learningRate, 0.9))
		.l2(1e-4)
		.list()
		.layer(0, lstmLayer1)
		.layer(1, denseLayer)
		.layer(2, rnnOutputLayer)
		.build();

	MultiLayerNetwork net = new MultiLayerNetwork(conf);
	net.init();
	net.setListeners(new ScoreIterationListener(10));

It is not clear to me how to reshape the data coming from the SQL DB. The page Recurrent Neural Network - Deeplearning4j describes features, examples, examplesNum, inputSize … etc. I think I am lost there trying to understand what would be timeSeriesLength, features, example, inputSize and the rest of terminology. So my question is, can someone please kindly clarify these terminologies, and how to shape them properly ?

Thank you

Take a look at Quickstart with Deeplearning4J – dubs·tech to learn more about transforming the data for training in general.

Your main problem here though is that JDBCRecordReader is not meant to be used for sequences, so each row in your result set is going to be a single example.

Going just by the information you’ve given with your example data, I guess that you have just a single sequence of values and you want to use it to train your model to predict the future. This means you are trying to build an Autoregressive Model.

I’ll try to explain the concepts in the context of an autoregressive model then:

  • features: Your variables at time t
  • labels: Your variables at time t+1
  • examples: a set of features and labels
  • input size: number of features
  • output size: number of labels
  • time series length: the number of steps you give as context before predicting the next step

There are many ways you can approach an autoregressive model.

The easiest way is to take your variables at time t as your features and their value at time t + 1 as the labels. This can be done with a simple feed forward network.

That approach has the problem that it doesn’t have any idea about the past. If you want to incorporate knowledge about previous steps, you can use the values of your variables at time t - n, t - (n+1), …, t as the features and predict the value at time t + 1. That too can still be a simple feed forward network.

Extending the inputs to incorporate additional knowledge about the past is nice, but it may be limiting, because you can’t handle arbitrary large sequences that way. So if you need to handle arbitrary sequence lengths, you can use either a recurrent network or a 1D convolutional network.

A recurrent network will work through the sequence in order, while a convolutional network has a more relaxed relationship with ordering (and will typically require at least a few past steps to be even applicable).

The convolutional approach will be more similar to the regular feed forward network, as it takes in a sequence and will return a single value.

The recurrent approach will take a sequence and can return either a single value or a sequence of values. When you have both a sequence of values as the features as well as the labels, you essentially start your feature sequence a time t=0 and the label sequence at time t=1. That would essentially produce a single, probably very long, example for your network.

Having a single long example has many downsides, one of them being that it requires a lot of resources and isn’t very efficient when training. Instead it usually makes sense to split the sequence into shorter sub-sequences, as that gives you a more parallelized training that requires less resources and usually trains better.

Another thing to note is that recurrent networks tend to “forget” the past beyond about 15 steps for LSTMs and about 5 to 7 steps for simpler RNNs. So having crazy long sequences doesn’t necessarily make any sense from the “give it as much context as possible” sense either.

In your case longer sequences should still work, because you’ve got an auto regressive situation, but keep that in mind anyway.

After this wall of text, I’d suggest that you simply prepare your data into a format that a pre-existing SequenceRecordReader can use, and go from there.

1 Like

Paul, thank you a lot for helping.
I exported the data to CSV file, until I get things to work, and understand what is going on. I read the article you wrote. Let’s focus on LSTM for now. Just to get something working, then I will try to experiment with other models.

Honestly, I am still struggling with shaping the data:

        int BATCH_SIZE = 5;

	CSVSequenceRecordReader reader = new CSVSequenceRecordReader(1, ",");
	Schema schema = new Schema.Builder().addColumnDouble("lower_margin")
		.addColumnDouble("upper_margin")
		.addColumnDouble("diff")
		.build();


	reader.initialize(new FileSplit(new File("data.csv")));

	SequenceRecordReaderDataSetIterator trainIter;

	trainIter = new SequenceRecordReaderDataSetIterator(reader, BATCH_SIZE, 3, 0, true);

	System.out.println(trainIter.getLabels());

	final double learningRate = 0.05;
	final double dropoutRatio = 0.2;

	LSTM lstmLayer0 = new LSTM.Builder()//
		.nIn(3)
		.nOut(64)
		.activation(Activation.TANH)
		.dropOut(dropoutRatio)
		.build();

	RnnOutputLayer rnnOutputLayer = new RnnOutputLayer.Builder(LossFunction.MSE).activation(Activation.IDENTITY)
		.nIn(64)
		.nOut(3)
		.gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
		.gradientNormalizationThreshold(10)
		.build();

	MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()//
		.seed(1234)
		.weightInit(WeightInit.XAVIER)
		.updater(new Nesterovs(learningRate, 0.9))
		.l2(1e-4)
		.list()
		.layer(0, lstmLayer0)
		.layer(1, rnnOutputLayer)
		.build();

	MultiLayerNetwork net = new MultiLayerNetwork(conf);
	net.init();
	net.setListeners(new ScoreIterationListener(10));

	net.fit(trainIter);

	trainIter.reset();
if (trainIter.hasNext()) {

		DataSet d = trainIter.next();

		System.out.println(d.getFeatures());
		System.out.println(d.getLabels());

		INDArrayIndex idx1 = NDArrayIndex.all();
		INDArrayIndex idx2 = NDArrayIndex.all();
		INDArrayIndex idx3 = NDArrayIndex.all();

		INDArrayIndex intrval = NDArrayIndex.interval(0, 1);

		INDArray features = d.getFeatures();
		INDArray lastData = features.get(idx1, idx2, idx3, intrval);

		INDArray pred = net.rnnTimeStep(lastData);

		System.out.println(pred);

	}

I have adjusted my csv to contain only the features/labels without unix-time stamp:

lower_margin,upper_margin,diff
0.88003705419175544233,0.15312164612708634025,0.12413154238073182029
0.60017439379603346877,0.46807644631921703576,-0.27751567963589942832
0.09413488648167383885,0.23692363768908328777,1.7769698768363259
0.68331708567322522918,0.16993166287015945330,-0.45876993166287015945
0.74833170088701335555,0.39282417045310548325,1.1341687796929780
....

Of course this code is not working as I am getting:

Exception in thread "main" java.lang.IllegalStateException: Illegal set of indices
for array: need at least 3 point/interval/all/specified indices for rank 3 array ([1, 2, 5635]),
got indices [all(), all(), all(), Interval(b=0,e=1,s=1)]

This is related more to the prediction part and how to use the model. However, if I try to print out a single dataset from the iterator, I get a 2D array of the first two columns from the CSV file. I was expecting a 3 rows array. So not sure here what is happening.

Again, what I am trying to achieve is to provide last few time steps (let’s say 90 time step), and predict the next 20 (for example). So the time series will be 90. I am just using smaller number in my example, to get up and running.

The error tells you that you are trying to index into an array that has 3 dimensions with 4 dimensions.

Looking at the value from your error, it looks exactly like I’d expect to see here. You’ve got one (1) sequence with two (2) features and that sequence has 5635 steps.

You have two features and one label with your current setup. I guess you want to have three features and three labels, such that the features at step t_n+1 are the labels for step t_n.

As you are struggling with the setup, I guess that the easiest way for you to do that would be to just generate the csv file to be exactly like that, i.e. 6 columns, first 3 have the inputs and last 3 have the outputs.

As for using that data to also run prediction:

If you look at the javadoc for rnnTimeStep it says:

So you have to feed it with data that has the expected shape.

Let’s assume that you have managed to set up your data the way I suggested it above (3 input values and 3 output values), then calling .getFeatures on the dataset will already produce an array that has the shape [1, 3, 5635]. If you want to take just the first 90 time steps of it, you can do it like that:

features.get(NDArrayIndex.all(), NDArrayIndex.all(), NDArrayIndex.interval(0, 90))

And it will have the shape [1, 3, 90].

You can then feed it to .rnnTimeStep and you will get outputs for all 90 steps. To now get the next 20 steps, you will need to take the output for just the last step (if you want to be lazy, feed it the last 89 steps first, then the 90th on its own, so you don’t have to do any sub array access) and use it as the input for .rnnTimeStep. You repeat this until you have collected all the 20 steps into the future.

@treo
Thank you a lot for all the help. I highly appreciate it.
I followed your advice by creating a feature/labels file of 6 columns:

0.88003705419175544233,0.15312164612708634025,0.12413154238073182029,  0.60017439379603346877,0.46807644631921703576,-0.27751567963589942832
0.60017439379603346877,0.46807644631921703576,-0.27751567963589942832, 0.09413488648167383885,0.23692363768908328777,1.7769698768363259
0.09413488648167383885,0.23692363768908328777,1.7769698768363259,      0.68331708567322522918,0.16993166287015945330,-0.45876993166287015945
0.68331708567322522918,0.16993166287015945330,-0.45876993166287015945, 0.74833170088701335555,0.39282417045310548325,1.1341687796929780
....

I had to change the DataSet iterator:

trainIter = new SequenceRecordReaderDataSetIterator(reader, BATCH_SIZE, 3, 3, true);

and not sure if this correct.

I am getting something out. May be it is not related at all, but this has more to do with my network configurations, and I need to figure this out.

This was a lot of help.
Thank you!