Cannot do LSTM -> Dense in Sequence Classification

manstegling · April 4, 2020, 4:31pm

Hi awesome people!

I’m using DL4J 1.0.0-beta6 for a Sequence Classification problem.
I’m now trying to combine LSTM with Dense layers with no success. It seems the ‘labelMask’ won’t behave well.

Dimensions: 18 mini-batches x 5 classes x 120 time-steps

First (minimal) approach, using ‘LastTimeStep’ to project [18, 5, 120] onto [18, 5]:

.list()
.layer(0, new LastTimeStep(new LSTM.Builder()
    //config
    .build()))
.layer(1, new OutputLayer.Builder()
    //config
    .build())
.build()

However, I cannot get this to work. Output for 2 different SequenceRecordReaderDataSetIterator.AlignmentMode:

ALIGN_END: “Invalid mask array. […] Mask shape: [18, 120], output shape: [18, 5]”
EQUAL_LENGTH: “Labels and preOutput must have equal shapes: got shapes [18, 5, 1] vs [18, 5]”

Second (minimal) approach, using ‘RnnToFeedForwardPreProcessor’:

.list()
.layer(0, new LSTM.Builder()
    //config
    .build())
.layer(1, new OutputLayer.Builder()
    //config
    .build())
.inputPreProcessor(1, new RnnToFeedForwardPreProcessor())
.build()

Unfortunately, I don’t get this to work either and encounter similar exceptions:

ALIGN_END: “Invalid mask array […] Mask shape: [18, 120], output shape: [2160, 5]”
EQUAL_LENGTH: “Incorrect number of arguments for permute function: got arguments [0, 2, 1] for rank 2 array. Number of arguments must equal array rank”

Any pointers are greatly appreciated. Just let me know if you need more info. Thank you so much!

PS. Just a pure input → LSTM → RnnOutputLayer network (with AlignmentMode.ALIGN_END) works fine.

treo · April 5, 2020, 10:01am

IN this case you don’t even need a label mask. It is used to mask out the timesteps in a sequence output.
As you have only a single output when you are using an output layer like that at the end the mask array doesn’t make sense.

manstegling · April 5, 2020, 12:46pm

I understand, but how can achieve that? Apologies, but I couldn’t find any examples of this and I’m rather stuck.

I even tried supplying null as AlignmentMode to the SequenceRecordReaderDataSetIterator to no avail (it simply defaults to ALIGN_START in the underlying RecordReaderMultiDataSetIterator).

It looks like the issue arises when the SequenceRecordReaderDataSetIterator creates the MultiDataSet (i.e. a minibatch). There, it simply checks for itself whether there should be a ‘labelMask’ or not:

RecordReaderMultiDataSetIterator:615

for (List<List<Writable>> c : list) {
    if (c.size() < maxTSLength)
        needMaskArray = true;
}

Since I’m doing classification I’ve got 120 timesteps in the feature file but only 1 timestep in the label file. Because of this, a label mask gets created in RecordReaderMultiDataSetIterator#convertFeaturesOrLabels which is then passed onto the network. As you’re saying, this doesn’t make sense for the LastTimeStep wrapper leading to a dimension mismatch.

Is there a way to solve this for pure seq classification or do I really need to also provide 120 labels?

Thanks!

treo · April 5, 2020, 5:58pm

You are right, when creating a sequence it does always create it in a sequence format. But, there is a simple way to solve that: LabelLastTimeStepPreProcessor

Just set this as the preprocessor on the iterator and it should take care of the problem.

manstegling · April 5, 2020, 9:33pm

Thank you for getting back so quickly!

I think the trick was the combination of using a LabelLastTimeStepPreProcessor and wrapping the LSTM layer in a LastTimeStep layer, such that:
Input -> LabelLastTimeStepPreProcessor -> LastTimeStep( LSTM ) -> OutputLayer.

Am I understanding it correctly that this basic setup will more or less be equivalent to Input -> LSTM -> RnnOutputLayer? That is, we are still using the full sequence, effectively unrolling the LSTM and hooking up the last step to the next (2D) layer?

treo · April 5, 2020, 9:48pm

Ah, I see you didn’t make use of the .setInputType feature. That also adds the necessary pre-processors for layer type changes automatically.

Yes, that is what is effectively happening here.

manstegling · April 5, 2020, 10:14pm

Thank you for the confirmation and thank you so much for the help! This is really exciting!

All in all this is what I did, in case someone else stumbles upon this in the future:

1: Set up the data set iterator with a LabelLastTimeStepPreProcessor (this modifies the labelMask shape for use with a 2D output layer instead of an RNN output layer):

DataSetIterator train = new SequenceRecordReaderDataSetIterator(
    featureReader, outcomeReader, batchSize, classes.size(), false, AlignmentMode.ALIGN_END);
train.setPreProcessor(new LabelLastTimeStepPreProcessor());

2: Wrap LSTM in LastTimeStep (lets you hook up the LSTM ‘classification output’ to a 2D downstream):

.list()
.layer(0, new LastTimeStep(new LSTM.Builder()
    //config
    .build()))
.layer(1, new OutputLayer.Builder()
    //config
    .build())
.build()

Thanks for pointing out the .setInputType feature but I’m not sure how I could’ve used it to make any shortcuts here.

Topic		Replies	Views
Error msg in LSTM RNN DL4J	2	44	July 10, 2024
LSTM->Rnnoutput shape problems - help please DL4J	1	493	August 26, 2021
Illegal State Exception 3D input expected to RNN layer expected, got 2 DL4J	3	786	March 3, 2022
1D CNN+LSTM Configuration Exception DL4J	3	82	May 27, 2024
Continuous sequence classification with LSTM DL4J	0	436	May 12, 2020

Cannot do LSTM -> Dense in Sequence Classification

Related topics