Cannot do LSTM -> Dense in Sequence Classification

Hi awesome people!

I’m using DL4J 1.0.0-beta6 for a Sequence Classification problem.
I’m now trying to combine LSTM with Dense layers with no success. It seems the ‘labelMask’ won’t behave well.

Dimensions: 18 mini-batches x 5 classes x 120 time-steps

First (minimal) approach, using ‘LastTimeStep’ to project [18, 5, 120] onto [18, 5]:

.list()
.layer(0, new LastTimeStep(new LSTM.Builder()
    //config
    .build()))
.layer(1, new OutputLayer.Builder()
    //config
    .build())
.build()

However, I cannot get this to work. Output for 2 different SequenceRecordReaderDataSetIterator.AlignmentMode:

  • ALIGN_END: “Invalid mask array. […] Mask shape: [18, 120], output shape: [18, 5]”
  • EQUAL_LENGTH: “Labels and preOutput must have equal shapes: got shapes [18, 5, 1] vs [18, 5]”

Second (minimal) approach, using ‘RnnToFeedForwardPreProcessor’:

.list()
.layer(0, new LSTM.Builder()
    //config
    .build())
.layer(1, new OutputLayer.Builder()
    //config
    .build())
.inputPreProcessor(1, new RnnToFeedForwardPreProcessor())
.build()

Unfortunately, I don’t get this to work either and encounter similar exceptions:

  • ALIGN_END: “Invalid mask array […] Mask shape: [18, 120], output shape: [2160, 5]”
  • EQUAL_LENGTH: “Incorrect number of arguments for permute function: got arguments [0, 2, 1] for rank 2 array. Number of arguments must equal array rank”

Any pointers are greatly appreciated. Just let me know if you need more info. Thank you so much!

PS. Just a pure input → LSTM → RnnOutputLayer network (with AlignmentMode.ALIGN_END) works fine.

IN this case you don’t even need a label mask. It is used to mask out the timesteps in a sequence output.
As you have only a single output when you are using an output layer like that at the end the mask array doesn’t make sense.

I understand, but how can achieve that? Apologies, but I couldn’t find any examples of this and I’m rather stuck.

I even tried supplying null as AlignmentMode to the SequenceRecordReaderDataSetIterator to no avail (it simply defaults to ALIGN_START in the underlying RecordReaderMultiDataSetIterator).

It looks like the issue arises when the SequenceRecordReaderDataSetIterator creates the MultiDataSet (i.e. a minibatch). There, it simply checks for itself whether there should be a ‘labelMask’ or not:

RecordReaderMultiDataSetIterator:615

for (List<List<Writable>> c : list) {
    if (c.size() < maxTSLength)
        needMaskArray = true;
}

Since I’m doing classification I’ve got 120 timesteps in the feature file but only 1 timestep in the label file. Because of this, a label mask gets created in RecordReaderMultiDataSetIterator#convertFeaturesOrLabels which is then passed onto the network. As you’re saying, this doesn’t make sense for the LastTimeStep wrapper leading to a dimension mismatch.

Is there a way to solve this for pure seq classification or do I really need to also provide 120 labels?

Thanks!

You are right, when creating a sequence it does always create it in a sequence format. But, there is a simple way to solve that: LabelLastTimeStepPreProcessor

Just set this as the preprocessor on the iterator and it should take care of the problem.

Thank you for getting back so quickly!

I think the trick was the combination of using a LabelLastTimeStepPreProcessor and wrapping the LSTM layer in a LastTimeStep layer, such that:
Input -> LabelLastTimeStepPreProcessor -> LastTimeStep( LSTM ) -> OutputLayer.

Am I understanding it correctly that this basic setup will more or less be equivalent to Input -> LSTM -> RnnOutputLayer? That is, we are still using the full sequence, effectively unrolling the LSTM and hooking up the last step to the next (2D) layer?

Ah, I see you didn’t make use of the .setInputType feature. That also adds the necessary pre-processors for layer type changes automatically.

Yes, that is what is effectively happening here.

Thank you for the confirmation and thank you so much for the help! This is really exciting!

All in all this is what I did, in case someone else stumbles upon this in the future:

1: Set up the data set iterator with a LabelLastTimeStepPreProcessor (this modifies the labelMask shape for use with a 2D output layer instead of an RNN output layer):

DataSetIterator train = new SequenceRecordReaderDataSetIterator(
    featureReader, outcomeReader, batchSize, classes.size(), false, AlignmentMode.ALIGN_END);
train.setPreProcessor(new LabelLastTimeStepPreProcessor());

2: Wrap LSTM in LastTimeStep (lets you hook up the LSTM ‘classification output’ to a 2D downstream):

.list()
.layer(0, new LastTimeStep(new LSTM.Builder()
    //config
    .build()))
.layer(1, new OutputLayer.Builder()
    //config
    .build())
.build()

Thanks for pointing out the .setInputType feature but I’m not sure how I could’ve used it to make any shortcuts here.