Bidirectional LSTM in DL4J based on python example

Hello! How can be the Bidirectional LSTM network like this be configured with DL4J?

More notes on the architecture are used in this example are mentioned there.

Unfortunately we don’t have the bandwidth to explain how to convert specific models.

Take a look at this example of defining a fairly complex LSTM based model. In order to introduce bidirectionality, all you have to do is wrap it in Bidirectional

See also this for more LSTM examples:
LSTMCharModellingExample
CompGraphLSTMExample

Ok, so currently you don’t have any out of box Bidirectional example?

Anyway, thanks a lot, I’ll take a look.

No, but adding that in is literally just, as the linked doc above says:

new Bidirectional(new LSTM.Builder()....build())

I have tried to build the network according the python source and the dl4j example and when I’m trying to build this:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                //These are the two inputs to the computation graph
                .addInputs("additionIn", "sumOut")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE), InputType.recurrent(FEATURE_VEC_SIZE))
                //The inputs to the encoder will have size = minibatch x featuresize x timesteps
                //Note that the network only knows of the feature vector size. It does not know how many time steps unless it sees an instance of the data
                .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                .addLayer("dense2048_1024_1", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm1")
                .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(2*numHiddenNodes).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "dense2048_1024_1")
                .addLayer("dense2048_1024_2", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm2")
                //Create a vertex indicating the very last time step of the encoder layer needs to be directed to other places in the comp graph
                .addVertex("lastTimeStep", new LastTimeStepVertex("additionIn"), "dense2048_1024_2")
                //Create a vertex that allows the duplication of 2d input to a 3d input
                //In this case the last time step of the encoder layer (viz. 2d) is duplicated to the length of the timeseries "sumOut" which is an input to the comp graph
                //Refer to the javadoc for more detail
                .addVertex("duplicateTimeStep", new DuplicateToTimeSeriesVertex("sumOut"), "lastTimeStep")
                .addLayer("dense1096_out", new DenseLayer.Builder().nIn(numHiddenNodes+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).build(),"sumOut", "duplicateTimeStep")
                //The inputs to the decoder will have size = size of output of last timestep of encoder (numHiddenNodes) + size of the other input to the comp graph,sumOut (feature vector size)
                //.addLayer("decoder", new LSTM.Builder().nIn(FEATURE_VEC_SIZE + numHiddenNodes).nOut(numHiddenNodes).activation(Activation.SOFTSIGN).build(), "sumOut", "duplicateTimeStep")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "dense1096_out")
                .setOutputs("output")
                .build();

I’m getting this exception:

Exception in thread "main" org.deeplearning4j.nn.conf.inputs.InvalidInputTypeException: Invalid input type: cannot get subset of non RNN input (got: InputTypeFeedForward(1024))
	at org.deeplearning4j.nn.conf.graph.rnn.LastTimeStepVertex.getOutputType(LastTimeStepVertex.java:101)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:536)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:449)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1201)
	at seq2seq.BidirectionalLSTM.main(BidirectionalLSTM.java:139)

Currently, I can’t figure out what particular parts of the network are connected wrong according to this.

Am I correctly concatenate the vertexes in order to put previous dense layer output (the upmost blue) and the input as the input (yellow) of some another dense layer

with these lines:

.addLayer("dense2048_1024_2", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm2")
.addVertex("lastTimeStep", new LastTimeStepVertex("additionIn"), "dense2048_1024_2")
.addVertex("duplicateTimeStep", new DuplicateToTimeSeriesVertex("sumOut"), "lastTimeStep")
.addLayer("dense1096_out", new DenseLayer.Builder().nIn(numHiddenNodes+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).build(),"sumOut", "duplicateTimeStep")

according to this example I need no MergeVertex at all, waw!

https://deeplearning4j.konduit.ai/models/computationgraph#example-1-recurrent-network-with-skip-connections

Now I have tried more correct configuration according to the docs on merge vertex and I need not set it up explicitly. I have tried to use Dense layers according to the exlanation of python models in order to reduce LSTM output to reproduce the initial network. So, now configuration is:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                .addInputs("additionIn")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
                .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                .addLayer("dense2048_1024_1", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm1")
                .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(2*numHiddenNodes).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "dense2048_1024_1")
                .addLayer("dense2048_1024_2", new DenseLayer.Builder().nIn(2*numHiddenNodes).nOut(numHiddenNodes).build(),"bdlstm2")
timeseries "sumOut" which is an input to the comp graph
                .addLayer("dense1096_out", new DenseLayer.Builder().nIn(numHiddenNodes+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).build(),"dense2048_1024_2", "additionIn")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "dense1096_out")
                .setOutputs("output")
                .build();

and what happens on build?

Exception in thread "main" org.deeplearning4j.nn.conf.inputs.InvalidInputTypeException: Invalid input: MergeVertex cannot merge activations of different types: first type = FF, input type 2 = RNN
	at org.deeplearning4j.nn.conf.graph.MergeVertex.getOutputType(MergeVertex.java:134)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:536)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:449)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1201)
	at seq2seq.BidirectionalLSTM.main(BidirectionalLSTM.java:140)

I can’t merge RNN input with FF (it looks like DL4J’s dense layers are not the same as in python libs, but what are the correct layers to connect LSTMs densely and made the dense concatenation with RNN input and output?

Everything is easier than I was supposed looking at the python example. At least the below network is built and the calculations were started:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                .addInputs("additionIn")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
                .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(2*numHiddenNodes).nOut(2*numHiddenNodes).activation(Activation.TANH).build()), "bdlstm1")
                .addVertex("merge", new MergeVertex(), "bdlstm2", "additionIn")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "merge")
                .setOutputs("output")
                .build();

So, actually the configuration that made me able to start the calculation (with omitted connection from the input to the output concatenation, I have to figure this out though) is:

ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
                .weightInit(WeightInit.XAVIER)
                .updater(new Adam(0.25))
                .seed(seed)
                .graphBuilder()
                 .addInputs("additionIn")
                .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
                 .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
                  .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(numHiddenNodes*2).nOut(FEATURE_VEC_SIZE).activation(Activation.TANH).build()), "bdlstm1")
                .addLayer("output", new RnnOutputLayer.Builder().nIn(FEATURE_VEC_SIZE*2).nOut(FEATURE_VEC_SIZE*2).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "bdlstm2")
                .setOutputs("output")
                .build();

And the most confusing things there are nIn and nOut of the Layer wrapped inside Bidirectional. It looks like Bidirectional behind the scenes duplicates the dimension of it’s actual output, but accepts not the duplicated, but the correct one. So that’s why some layers have 2* multiplications before the variables. Is it as designed? Otherwise the dimensions doesn’t meet each other.

When you use .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE)) you can actually drop the nIn definitions entirely, and just use nOut to specify the size of the layer.

But yes, Bidirectional doubles the size in order to accommodate the other direction.

1 Like

And now the final view of what I wanted before:

    ComputationGraphConfiguration configuration = new NeuralNetConfiguration.Builder()
            .weightInit(WeightInit.XAVIER)
            .updater(new Adam(0.25))
            .seed(seed)
            .graphBuilder()
            .addInputs("additionIn")
            .setInputTypes(InputType.recurrent(FEATURE_VEC_SIZE))
            .addLayer("bdlstm1", new Bidirectional(new LSTM.Builder().nIn(FEATURE_VEC_SIZE).nOut(numHiddenNodes).activation(Activation.TANH).build()), "additionIn")
            .addLayer("bdlstm2", new Bidirectional(new LSTM.Builder().nIn(numHiddenNodes*2).nOut(numHiddenNodes).activation(Activation.TANH).build()), "bdlstm1")
            .addVertex("merge", new MergeVertex(), "bdlstm2", "additionIn")
            .addLayer("output", new RnnOutputLayer.Builder().nIn(numHiddenNodes*2+FEATURE_VEC_SIZE).nOut(FEATURE_VEC_SIZE).activation(Activation.SOFTMAX).lossFunction(LossFunctions.LossFunction.MCXENT).build(), "merge")
            .setOutputs("output")
            .build();
1 Like