Imported Keras LSTM layer mismatch

treo · February 11, 2020, 12:40pm

It only looks weird because Keras and DL4J use a slightly different memory layout for their LSTM weights.

As you know, lstms are a bit more complicated than just y_t = h(W*x+RW*y_(t-1)+b) (which is a SimpleRNN), and therefore they have more logical weights than just W and RW.

However, both Keras and DL4J pack those additional weights into those two matrices.

In Keras the order is i, f, c, o:

github.com

keras-team/keras/blob/8904340b4f0ca94e4aa8a3dbc5e955591962d0a1/keras/layers/recurrent.py#L1946-L1956


      
          self.kernel_i = self.kernel[:, :self.units]
          self.kernel_f = self.kernel[:, self.units: self.units * 2]
          self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
          self.kernel_o = self.kernel[:, self.units * 3:]
          
          
self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
          self.recurrent_kernel_f = (
              self.recurrent_kernel[:, self.units: self.units * 2])
          self.recurrent_kernel_c = (
              self.recurrent_kernel[:, self.units * 2: self.units * 3])
          self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]

while in DL4J the order is c,f,o,i:

github.com

eclipse/deeplearning4j/blob/master/deeplearning4j/deeplearning4j-modelimport/src/main/java/org/deeplearning4j/nn/modelimport/keras/layers/recurrent/KerasLSTM.java#L400-L414


      
          int wCols = W_c.columns();
          int wRows = W_c.rows();
          
          
INDArray W = Nd4j.zeros(wRows, 4 * wCols);
          W.put(new INDArrayIndex[]{NDArrayIndex.interval(0, wRows), NDArrayIndex.interval(0, wCols)}, W_c);
          W.put(new INDArrayIndex[]{NDArrayIndex.interval(0, wRows), NDArrayIndex.interval(wCols, 2 * wCols)}, W_f);
          W.put(new INDArrayIndex[]{NDArrayIndex.interval(0, wRows), NDArrayIndex.interval(2 * wCols, 3 * wCols)}, W_o);
          W.put(new INDArrayIndex[]{NDArrayIndex.interval(0, wRows), NDArrayIndex.interval(3 * wCols, 4 * wCols)}, W_i);
          this.weights.put(LSTMParamInitializer.INPUT_WEIGHT_KEY, W);
          
          
int uCols = U_c.columns();
          int uRows = U_c.rows();
          INDArray U = Nd4j.zeros(uRows, 4 * uCols);
          U.put(new INDArrayIndex[]{NDArrayIndex.interval(0, U.rows()), NDArrayIndex.interval(0, uCols)}, U_c);
          U.put(new INDArrayIndex[]{NDArrayIndex.interval(0, U.rows()), NDArrayIndex.interval(uCols, 2 * uCols)}, U_f);

So that’s why there is a difference in the outputs.

Topic		Replies	Views
Keras import model gave different prediction results DL4J	1	451	February 26, 2020
Problem with imported Keras LSTM model DL4J	1	618	January 30, 2020
How to predict with an imported Keras model DL4J	23	576	September 16, 2022
Unable to import Many-to-one LSTM Keras model DL4J	17	993	March 3, 2020
Keras Model import error with TimeDistributed layer DL4J	1	746	July 29, 2020

Imported Keras LSTM layer mismatch

Related topics