Imported Keras LSTM layer mismatch

I tried to import a keras model which consists of a Masking layer , LSTM(64) and then two dense layers.
It takes input the one hot encoded vectors which are then pre-padded to fixed length.
In Java I simply used this to import the model given the h5 file saved from keras:

MultiLayerNetwork model= KerasModelImport.importKerasSequentialModelAndWeights(model_weight_struc_path);

After importing the model, i checked the dense layers, their model weights and bias matched up with the python version. However the LSTM layer weights don’t match up.

If i just run the input and get the lstm layer output using feedforward().get() method, it doesn’t match the result given by python model from the same lstm output.

I assume there’re some difference because in keras the input dimension is (1, timestep, features) but in DL4J this is (1,features,timestep).

So i created a toy model:

    model=Sequential()
    model.add(Masking(mask_value=0.0,input_shape=(3,5)))
    model.add(LSTM(6))
    model.add(Dense(1,activation='sigmoid'))

Then in python, if i do model.layers[1].get_weights()[1] it gives me a (6,24) matrix as below:

But in java, calling System.out.println(model.getParam("1_RW")) gives me this (6,24) matrix:

    [[    0.0088,   -0.2417,   -0.2305,    0.2706,    0.1590,    0.3007,   -0.0555,    0.0270,    0.3155,   -0.0984,   -0.1430,   -0.0205,   -0.3000,    0.1479,   -0.0466,    0.2184,    0.1548,    0.0750,   -0.0230,   -0.3926,   -0.4117,    0.0292,    0.0271,   -0.2201], 
     [    0.1818,    0.1398,   -0.0671,   -0.0146,   -0.2206,   -0.3618,   -0.0043,    0.5145,    0.0989,    0.1930,   -0.3135,    0.2240,   -0.0753,   -0.1024,    0.0655,   -0.0538,    0.2332,   -0.0445,    0.0382,   -0.0611,   -0.2177,   -0.1036,   -0.3807,    0.1219], 
     [   -0.2991,    0.1280,    0.5051,    0.5081,    0.2659,   -0.0082,   -0.2817,    0.2300,   -0.0058,   -0.1414,   -0.0094,    0.0928,   -0.1039,    0.0323,    0.0075,    0.0445,    0.0116,   -0.0360,   -0.0148,    0.0854,    0.1633,    0.2460,   -0.1953,    0.0297], 
     [    0.3264,    0.2795,   -0.0577,   -0.3209,    0.3719,    0.3687,   -0.0624,    0.3096,   -0.0187,    0.0080,    0.1646,    0.1298,    0.2549,    0.2799,   -0.0596,   -0.0176,    0.1987,   -0.0776,    0.0796,   -0.0164,    0.0126,    0.3000,    0.0046,    0.0355], 
     [   -0.0691,    0.2839,    0.0199,   -0.0236,    0.0436,   -0.0590,   -0.1854,   -0.3150,    0.2103,   -0.0140,    0.2179,    0.1249,    0.0680,    0.3275,   -0.2206,    0.1803,   -0.1980,   -0.0579,    0.0450,   -0.0770,   -0.2042,   -0.3926,   -0.3038,    0.3681], 
     [    0.0067,   -0.0010,   -0.0336,    0.1233,   -0.1479,    0.2217,   -0.1695,   -0.1380,   -0.2195,    0.1326,   -0.0877,    0.4220,   -0.0741,    0.1814,    0.2704,    0.1903,    0.3147,    0.1108,   -0.1511,    0.4744,   -0.1200,   -0.1763,    0.2374,    0.0676]]

the ordering is weird, i can see the first element of the first row in Nd4j (which is 0.0088) appears in the first row of python matrix but from the middle of it. Then it continues off until some point and jumps to the front of the row.

Can anyone take a look at this?

Edit: Formatted Code and output text for better readability.

It only looks weird because Keras and DL4J use a slightly different memory layout for their LSTM weights.

As you know, lstms are a bit more complicated than just y_t = h(W*x+RW*y_(t-1)+b) (which is a SimpleRNN), and therefore they have more logical weights than just W and RW.

However, both Keras and DL4J pack those additional weights into those two matrices.

In Keras the order is i, f, c, o:

while in DL4J the order is c,f,o,i:

So that’s why there is a difference in the outputs.

thanks a lot, i sort of expected that because of the swap in the timestep compared to the python version.
Now the input i have in python is of the shape (None, 70,39) where 70 is the timestep. in java, i tested with same sample input with shape (None, 39,70) with the position of 1 in one-hot encoding swapped.

The end result is that i simply don’t get the same result. Any idea why? please?

I was using a custom method to do the encoding:

        public INDArray process(String domain){
            //this is suppose to encode the ArrayList to one hot encoded matrix
            //INDArray encoded_matrix= Nd4j.zeros(1,this.encoding_length,this.max_length);
            INDArray encoded_matrix=Nd4j.zeros(DataType.DOUBLE,1,this.encoding_length,this.max_length);

            //INDArray encoded_matrix=Nd4j.zeros(1,this.max_length,this.encoding_length);

            ArrayList<Integer> strIntArr=StrToArray(domain);

            for (int i:strIntArr){
                System.out.println(i);
            }

            for (int i=0;i<strIntArr.size();i++){
                if (strIntArr.get(i) !=-1) {
                    //encoded_matrix.putScalar(0,i,strIntArr.get(i), 1.0);
                    encoded_matrix.putScalar(new int[]{0, strIntArr.get(i),i},1.0);
                }
            }
            return encoded_matrix;
        }

Have you checked that you actually get the input you think you are getting? For example, if you rearrange the dimensions in your python data to be equivalent to the dl4j shape and then print both of them, are they equal?

in python, take a sample for example with the shape (1,70,39) i would do this:

//sample_arry
sample_input=np.zeros(1,70,39);
for i in range(70):
    if sample_arry[i]!=-1:
          sample_input[1,i,sample_arry[i]]=1

for the java version i would do this:

            for (int i=0;i<strIntArr.size();i++){
                if (strIntArr.get(i) !=-1) {
                    //encoded_matrix.putScalar(0,i,strIntArr.get(i), 1.0);
                    encoded_matrix.putScalar(new int[]{0, strIntArr.get(i),i},1.0);
                }
            }
            return encoded_matrix;

you see the difference is where i put the scalar 1.0 in the second dimension or the third dimension. This is because the timestep in java is in the third dimension right?

I’m not talking about the simplified code extracts you are sharing. I’ve spent a lot of time looking for bugs in places where I’d expect them to be only to find that it was something entirely different that messed everything up.

I’m asking you to make a quick sanity check: transpose your numpy array (it should be something like sample_input2 = np.transpose(sample_input, (0,2,1))), print it and compare that to what you get from your java code right before you pass it to the network.

If they are different then we have to continue looking into the data loading / preparing pipeline. If they are identical, we’ll have to create a sample project to figure out what exactly is going on there.

thanks, i just did, yes it matches up. I’ll show in python here.

toy=np.zeros((1,3,5))
toy[0,1,2]=1
toy[0,2,3]=1

this gives me:

array([[[0., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.]]])

for java version:

toy_jav=np.zeros((1,5,3))
toy_jav[0,2,1]=1
toy_jav[0,3,2]=1

this gives me:

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 0.]]])

now doing np.transpose(toy,(0,2,1)) it gives me also:

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 1., 0.],
        [0., 0., 1.],
        [0., 0., 0.]]])

nothing different
(just to mention i did manually checked the encoded matrix for one of the samples and all matched up)

Sorry, you seem to have misunderstood me.

I mean in your actual code: Transpose your prediction input in python and print it out, and then print out your prediction input in java.

sorry about that, I ran this in java:

       public INDArray process(String domain){
            //this is suppose to encode the ArrayList to one hot encoded matrix
            //INDArray encoded_matrix= Nd4j.zeros(1,this.encoding_length,this.max_length);
            //INDArray encoded_matrix=Nd4j.zeros(DataType.DOUBLE,1,this.encoding_length,this.max_length);

            INDArray encoded_matrix=Nd4j.zeros(DataType.DOUBLE,1,5,3);
            //INDArray encoded_matrix=Nd4j.zeros(1,this.max_length,this.encoding_length);

            //ArrayList<Integer> strIntArr=StrToArray(domain);
            ArrayList<Integer> strIntArr=new ArrayList<Integer>(Arrays.asList(new Integer[]{-1, 2, 3}));

            for (int i:strIntArr){
                System.out.println(i);
            }

            for (int i=0;i<strIntArr.size();i++){
                if (strIntArr.get(i) !=-1) {
                    //encoded_matrix.putScalar(0,i,strIntArr.get(i), 1.0);
                    encoded_matrix.putScalar(new int[]{0, strIntArr.get(i),i},1.0);
                }
            }
            return encoded_matrix;
        }

it gives me this:

[[[         0,         0,         0], 
  [         0,         0,         0], 
  [         0,    1.0000,         0], 
  [         0,         0,    1.0000], 
  [         0,         0,         0]]]

from python,

toy=np.zeros((1,3,5))
toy[0,1,2]=1
toy[0,2,3]=1
np.transpose(toy,(0,2,1))

which gives me this:

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 0., 0.]])

thanks

And if you now put that exact input (non-transposed for the python version) through both keras and dl4j, is the output different?

hmmm, you’re right, they give the same result, it’s probably the python preprocessing part, i’ll double check that. Many thanks!
Good to know the c,f,o,i order.

I take it you wouldn’t believe for a much larger model having the same structure, it just give me different result. Same stack of layers, but just larger number of units.

I just ran the actual input through both python and java pre-processing, checked every single row after transposing, all matched up, but two models give me different results.

Anyway, thanks for helping

It isn’t about belief. Just that the pre-processing part is often to blame when going from python to java, so I prefer to get that out of the way.

So even with the same identical input, in a larger model you do get different output. So the next question is: are you running it on CPU or GPU? If on GPU, are you using cuDNN?

Also how much of a difference is the output? Is it entirely different, or just a little bit off?

so i tried models with different number of parameters, trained or not, when i just have some rather randomly generated model that has the output around 0.5 then the java model would match pretty much exactly with python model. but otherwise the java model is either output extremely close to zero or very close to 1.

Also other thing i notice is that the optimizationAlgo=stochastic_gradient_descent but i defined it as ‘adam’ in keras.

Is there anything other than model.conf() that can output a more detailed structure? I did check all the model weights and biases, apart from the LSTM layer all the others are exact match.

I’m using 1.0.0-beta6 by the way. When i tried the keras functional API it says cannot be converted to feedforward layer (or something like that).

I’m just using CPU no GPU, no cuDNN.

The outputs are entirely different, except when i have model that predict everything close to 0.5

When i tried the keras functional API it says cannot be converted to feedforward layer (or something like that).

In that case you have to use importKerasModelAndWeights instead of importKerasSequentialModelAndWeights.

You can get some more readable detail using System.out.println(model.summary());

Is there a way for you to create a demo project that shows the unexpected behavior? I.e. something that I can go ahead and import in order to fully reproduce it and see what is going on.

Thanks, I did use importKerasModelAndWeights instead of the sequential one.
I also used summary() before but it doesn’t tell anything about the activation or anything else.

I will create some demo project for the unexpected behaviour.

but before I do, i like to get back to the point of preprocessing.

I used python to do the preprocesing, afterwards i did the matrix transpose to transpose the second and third dimension. then in java, i used Nd4j.createFromNpyfile to create an input INDArray.

The imported model in java is just keras sequential import from the h5 file saved from python side. but now with the data processed from python side. By doing this there won’t be any discrepency in the input right?

I ran it through the model, all 5 test cases get very close to 1, completely different from running the same input through python model (with 2,3 dimension transposed of course).

So i think it’s safe to say the problem comes from the model import and definitely not from preprocessing because they used same input and in java i only ever called model import and nothing else but model.out(inp,false).

I’ve put the h5 file and the test_matrix.npy files in the github repo, the other two files are nothing but importing the same h5,test_matrix file in python and java, you’d see they give different result.
https://github.com/tintinxue1/dl4j_KerasImport/tree/master

same input, same file, different results.

I’ve also added a model_toy.h5 file in which case same structure having mask layer +lstm + dense layer but small, in this case the java and python would agree on the result.

I’ll have to take a closer look on what exactly is going on here. I’ll report back once I’ve got some conclusions.

1 Like

@eraly has taken a deeper dive into it, and it is really a problem with the import of a model containing a masking layer.

In DL4J, we usually deal with masking by explicitly masking off inputs, see https://deeplearning4j.org/docs/latest/deeplearning4j-nn-recurrent#importing-time-series-data for more information about that.

By masking your data that way, you should be able to workaround the problem.

But we are still treating this as a bug, and will keep you updated in the issue you’ve already posted: https://github.com/eclipse/deeplearning4j/issues/8701

Many thanks, using the SequenceRecordReader with the Alignment actually worked! although i have to write the inputs into txt file and read it back again, but it does work now. Thank you very much for your help. Hope the bug can be fixed soon.