Training data shapping

Take a look at Quickstart with Deeplearning4J – dubs·tech to learn more about transforming the data for training in general.

Your main problem here though is that JDBCRecordReader is not meant to be used for sequences, so each row in your result set is going to be a single example.

Going just by the information you’ve given with your example data, I guess that you have just a single sequence of values and you want to use it to train your model to predict the future. This means you are trying to build an Autoregressive Model.

I’ll try to explain the concepts in the context of an autoregressive model then:

  • features: Your variables at time t
  • labels: Your variables at time t+1
  • examples: a set of features and labels
  • input size: number of features
  • output size: number of labels
  • time series length: the number of steps you give as context before predicting the next step

There are many ways you can approach an autoregressive model.

The easiest way is to take your variables at time t as your features and their value at time t + 1 as the labels. This can be done with a simple feed forward network.

That approach has the problem that it doesn’t have any idea about the past. If you want to incorporate knowledge about previous steps, you can use the values of your variables at time t - n, t - (n+1), …, t as the features and predict the value at time t + 1. That too can still be a simple feed forward network.

Extending the inputs to incorporate additional knowledge about the past is nice, but it may be limiting, because you can’t handle arbitrary large sequences that way. So if you need to handle arbitrary sequence lengths, you can use either a recurrent network or a 1D convolutional network.

A recurrent network will work through the sequence in order, while a convolutional network has a more relaxed relationship with ordering (and will typically require at least a few past steps to be even applicable).

The convolutional approach will be more similar to the regular feed forward network, as it takes in a sequence and will return a single value.

The recurrent approach will take a sequence and can return either a single value or a sequence of values. When you have both a sequence of values as the features as well as the labels, you essentially start your feature sequence a time t=0 and the label sequence at time t=1. That would essentially produce a single, probably very long, example for your network.

Having a single long example has many downsides, one of them being that it requires a lot of resources and isn’t very efficient when training. Instead it usually makes sense to split the sequence into shorter sub-sequences, as that gives you a more parallelized training that requires less resources and usually trains better.

Another thing to note is that recurrent networks tend to “forget” the past beyond about 15 steps for LSTMs and about 5 to 7 steps for simpler RNNs. So having crazy long sequences doesn’t necessarily make any sense from the “give it as much context as possible” sense either.

In your case longer sequences should still work, because you’ve got an auto regressive situation, but keep that in mind anyway.

After this wall of text, I’d suggest that you simply prepare your data into a format that a pre-existing SequenceRecordReader can use, and go from there.

1 Like