Bidirectional LSTM in DL4J based on python example

Eljah · May 21, 2020, 5:33pm

Hello! How can be the Bidirectional LSTM network like this be configured with DL4J?

More notes on the architecture are used in this example are mentioned there.

treo · May 21, 2020, 5:43pm

Unfortunately we don’t have the bandwidth to explain how to convert specific models.

Take a look at this example of defining a fairly complex LSTM based model. In order to introduce bidirectionality, all you have to do is wrap it in Bidirectional
https://github.com/eclipse/deeplearning4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent/encdec/EncoderDecoderLSTM.java#L250-L287

See also this for more LSTM examples:
LSTMCharModellingExample
CompGraphLSTMExample

Eljah · May 21, 2020, 5:45pm

Ok, so currently you don’t have any out of box Bidirectional example?

Anyway, thanks a lot, I’ll take a look.

treo · May 21, 2020, 5:55pm

No, but adding that in is literally just, as the linked doc above says:

new Bidirectional(new LSTM.Builder()....build())

Eljah · May 21, 2020, 6:57pm

I have tried to build the network according the python source and the dl4j example and when I’m trying to build this:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                //These are the two inputs to the computation graph
                .addInputs("additionIn", "sumOut")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE), InputType.recurrent(FEATURE_VEC_SIZE))
                //The inputs to the encoder will have size = minibatch x featuresize x timesteps
                //Note that the network only knows of the feature vector size. It does not know how many time steps unless it sees an instance of the data
                .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                .addLayer("dense2048_1024_1", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm1")
                .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(2*numHiddenNodes).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "dense2048_1024_1")
                .addLayer("dense2048_1024_2", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm2")
                //Create a vertex indicating the very last time step of the encoder layer needs to be directed to other places in the comp graph
                .addVertex("lastTimeStep", new LastTimeStepVertex("additionIn"), "dense2048_1024_2")
                //Create a vertex that allows the duplication of 2d input to a 3d input
                //In this case the last time step of the encoder layer (viz. 2d) is duplicated to the length of the timeseries "sumOut" which is an input to the comp graph
                //Refer to the javadoc for more detail
                .addVertex("duplicateTimeStep", new DuplicateToTimeSeriesVertex("sumOut"), "lastTimeStep")
                .addLayer("dense1096_out", new DenseLayer.Builder().nIn(numHiddenNodes+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).build(),"sumOut", "duplicateTimeStep")
                //The inputs to the decoder will have size = size of output of last timestep of encoder (numHiddenNodes) + size of the other input to the comp graph,sumOut (feature vector size)
                //.addLayer("decoder", new LSTM.Builder().nIn(FEATURE_VEC_SIZE + numHiddenNodes).nOut(numHiddenNodes).activation(Activation.SOFTSIGN).build(), "sumOut", "duplicateTimeStep")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "dense1096_out")
                .setOutputs("output")
                .build();

I’m getting this exception:

Exception in thread "main" org.deeplearning4j.nn.conf.inputs.InvalidInputTypeException: Invalid input type: cannot get subset of non RNN input (got: InputTypeFeedForward(1024))
	at org.deeplearning4j.nn.conf.graph.rnn.LastTimeStepVertex.getOutputType(LastTimeStepVertex.java:101)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:536)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:449)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1201)
	at seq2seq.BidirectionalLSTM.main(BidirectionalLSTM.java:139)

Currently, I can’t figure out what particular parts of the network are connected wrong according to this.

Eljah · May 21, 2020, 7:01pm

Am I correctly concatenate the vertexes in order to put previous dense layer output (the upmost blue) and the input as the input (yellow) of some another dense layer

with these lines:

.addLayer("dense2048_1024_2", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm2")
.addVertex("lastTimeStep", new LastTimeStepVertex("additionIn"), "dense2048_1024_2")
.addVertex("duplicateTimeStep", new DuplicateToTimeSeriesVertex("sumOut"), "lastTimeStep")
.addLayer("dense1096_out", new DenseLayer.Builder().nIn(numHiddenNodes+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).build(),"sumOut", "duplicateTimeStep")

Eljah · May 21, 2020, 7:08pm

according to this example I need no MergeVertex at all, waw!

https://deeplearning4j.konduit.ai/models/computationgraph#example-1-recurrent-network-with-skip-connections

Eljah · May 21, 2020, 7:28pm

Now I have tried more correct configuration according to the docs on merge vertex and I need not set it up explicitly. I have tried to use Dense layers according to the exlanation of python models in order to reduce LSTM output to reproduce the initial network. So, now configuration is:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                .addInputs("additionIn")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
                .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                .addLayer("dense2048_1024_1", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm1")
                .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(2*numHiddenNodes).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "dense2048_1024_1")
                .addLayer("dense2048_1024_2", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm2")
timeseries "sumOut" which is an input to the comp graph
                .addLayer("dense1096_out", new DenseLayer.Builder().nIn(numHiddenNodes+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).build(),"dense2048_1024_2", "additionIn")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "dense1096_out")
                .setOutputs("output")
                .build();

and what happens on build?

Exception in thread "main" org.deeplearning4j.nn.conf.inputs.InvalidInputTypeException: Invalid input: MergeVertex cannot merge activations of different types: first type = FF, input type 2 = RNN
	at org.deeplearning4j.nn.conf.graph.MergeVertex.getOutputType(MergeVertex.java:134)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:536)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:449)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1201)
	at seq2seq.BidirectionalLSTM.main(BidirectionalLSTM.java:140)

I can’t merge RNN input with FF (it looks like DL4J’s dense layers are not the same as in python libs, but what are the correct layers to connect LSTMs densely and made the dense concatenation with RNN input and output?

Eljah · May 21, 2020, 7:37pm

Everything is easier than I was supposed looking at the python example. At least the below network is built and the calculations were started:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                .addInputs("additionIn")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
                .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(2*numHiddenNodes).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "bdlstm1")
                .addVertex("merge", new MergeVertex(), "bdlstm2", "additionIn")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "merge")
                .setOutputs("output")
                .build();

Eljah · May 21, 2020, 8:52pm

So, actually the configuration that made me able to start the calculation (with omitted connection from the input to the output concatenation, I have to figure this out though) is:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                 .addInputs("additionIn")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
                 .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                  .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(numHiddenNodes*2).nOut(FEATURE_VEC_SIZE).activation(Activation.TANH).build()), "bdlstm1")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE*2).nOut(FEATURE_VEC_SIZE*2).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "bdlstm2")
                .setOutputs("output")
                .build();

And the most confusing things there are nIn and nOut of the Layer wrapped inside Bidirectional. It looks like Bidirectional behind the scenes duplicates the dimension of it’s actual output, but accepts not the duplicated, but the correct one. So that’s why some layers have 2* multiplications before the variables. Is it as designed? Otherwise the dimensions doesn’t meet each other.

treo · May 21, 2020, 9:04pm

When you use .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE)) you can actually drop the nIn definitions entirely, and just use nOut to specify the size of the layer.

But yes, Bidirectional doubles the size in order to accommodate the other direction.

Eljah · May 21, 2020, 9:16pm

And now the final view of what I wanted before:

    ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.XAVIER)
            .updater(new Adam(0.25))
            .seed(seed)
            .graphBuilder()
            .addInputs("additionIn")
            .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
            .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
            .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(numHiddenNodes*2).nOut(numHiddenNodes).activation(Activation.TANH).build()), "bdlstm1")
            .addVertex("merge", new MergeVertex(), "bdlstm2", "additionIn")
            .addLayer("output", new RnnOutputLayer.Builder().nIn(numHiddenNodes*2+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "merge")
            .setOutputs("output")
            .build();

Topic		Replies	Views
Problem removing lstm, shape exception RL4J	5	365	July 3, 2022
How can I program this neural network architecture in Deeplearning4j? DL4J	4	1165	September 18, 2020
Imported Keras LSTM layer mismatch DL4J	18	1481	February 14, 2020
1D CNN+LSTM Configuration Exception DL4J	3	82	May 27, 2024
ComputationGraph or non-sequential example? DL4J	6	334	January 5, 2022

Bidirectional LSTM in DL4J based on python example

Related topics