Unable to import Many-to-one LSTM Keras model

I am facing an issue importing an LSTM model using Keras (Functional API) v2.3.1 with Tensorflow v2.1.0. I am using the configuration return_sequences=False for my LSTM layer. I am importing the model using:

ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(model,true);

I get the following stack trace:

I have seen in previous issues that this may have been resolved. Also in 1.0.0-alpha, this bug was marked as fixed.

How can I fix this issue?

I am using

  • Deeplearning4j 1.0.0-beta6
  • Platform information OSX

And my Pom is: POM: https://gist.github.com/YusufuShehu/24cf252b5dde5049223ccb979af2b932

I have also put this issue on Github but have not received a response yet.

That does look suspicious. Can you share the exported model or the way you produced it, so we can take a closer look?

There are tests for the previous issues, so that shouldn’t be a regression.

Thanks for looking into this.

Model creation

words_input = Input(shape=(None,), dtype='int32', name='words_input')
words = Embedding(input_dim=wordEmbeddings.shape[0], output_dim=wordEmbeddings.shape[1], weights=[wordEmbeddings],
                  trainable=False)(words_input)


words_trans = TimeDistributed(Dense(100))(words)

casing_input = Input(shape=(None, word_features_cnt,), name='casing_input')

casing_trans = TimeDistributed(Dense(100))(casing_input)

character_input = Input(shape=(None, 52,), name='char_input')

embed_char_out = TimeDistributed(
    Embedding(len(char2Idx), 30, embeddings_initializer=RandomUniform(minval=-0.5, maxval=0.5)), name='char_embedding')(
    character_input)

dropout = Dropout(0.5)(embed_char_out)

conv1d_out = TimeDistributed(Conv1D(kernel_size=3, filters=30, padding='same', activation='tanh', strides=1))(dropout)
maxpool_out = TimeDistributed(MaxPooling1D(52))(conv1d_out)
char = TimeDistributed(Flatten())(maxpool_out)

char = Dropout(0.5)(char)
output = concatenate([words_trans, casing_trans, char])

output = Bidirectional(LSTM(200, return_sequences=False, dropout=0.50, recurrent_dropout=0.25))(output) // This line is likely causing the issue
output = Dense(len(label2Idx), activation='softmax')(output)
model = Model(inputs=[words_input, casing_input, character_input], outputs=[output])
model.compile(loss='sparse_categorical_crossentropy', optimizer='nadam')
model.summary()
model.save("models/NER_dl4j_functional_many_to_one.hdf5")

Exporting the model

public class CModelImport {

    public static ComputationGraph getKerasModel() throws InvalidKerasConfigurationException, IOException, UnsupportedKerasConfigurationException {
        String simpleMlp = new ClassPathResource(
                "NER_dl4j_functional_many_to_one.hdf5").getFile().getPath();

        KerasModelBuilder builder = new KerasModel()
                .modelBuilder()
                .modelHdf5Filename(simpleMlp)
                .enforceTrainingConfig(true);

        KerasModel model = builder.buildModel();
        ComputationGraph graph = model.getComputationGraph();
        return graph;
    }
}

Edit: Made the posted code more readable.

@eraly Would you mind taking a closer look at this?

Of course :slight_smile:

@suf Can you add what wordEmbeddings is to the code so I can recreate a model on my end? Thanks!

# :: Read in word embeddings ::
word2Idx = { }
wordEmbeddings = [ ]

fEmbeddings = open("embeddings/glove.6B.100d.txt", encoding="utf-8")

for line in fEmbeddings:
    split = line.strip().split(" ")
    word = split[0]

    if len(word2Idx) == 0:  # Add padding+unknown
        word2Idx["PADDING_TOKEN"] = len(word2Idx)
        vector = np.zeros(len(split) - 1)  # Zero vector vor 'PADDING' word
        wordEmbeddings.append(vector)

        word2Idx["UNKNOWN_TOKEN"] = len(word2Idx)
        vector = np.random.uniform(-0.25, 0.25, len(split) - 1)
        wordEmbeddings.append(vector)

    else:
        vector = np.array([float(num) for num in split[1:]])
        wordEmbeddings.append(vector)
        word2Idx[split[0]] = len(word2Idx)

wordEmbeddings = np.array(wordEmbeddings)

For the sake of saving time reproducing the model, you could make a 2D NumPy array filled with zeros instead of reading in GloVe.

@suf I need a python model to look into this.
I am happy to recreate the model on my end if you can give me python code which runs out of the box. The code above currently doesn’t. word_features_cnt isn’t defined. char2Idx isn’t either.
Or you could link to an uploaded h5 on google drive or here.

Please find the hdf5 file drive link attached
NER_dl4j_functional_many_to_one.hdf5

Wonderful. Thank you.

I’ve requested access to download the link.

You should have access now.

1 Like

Hi, were you able to import this model?

Hi, Yes, I was. It looks like you’ve hit a corner case with the bidirectional lstm import + return sequence true. Bit of a gnarly bug.

There is also an additional problem related to wrapping embedding models in time distributed.

Does that mean the bug is fixed if you’re able to import it and create the ComputationGraph?

I noticed that this bug was highlighted as fixed with version 1.0.0-alpha:

https://deeplearning4j.org/release-notes#onezerozeroalpha-dl4jkeras

It was I think a new feature in alpha. But like I said there was a bug with bidirectional LSTMs in the import. I was able to import the graph with my fix. But it err’d out on the forward pass because of the second bug I mentioned. I can spend some more time on it. Will keep you posted. Best case scenario - I have a quick fix and then you can build from source or wait till the fix is available on snapshots.

Indeed, I found using vanilla LSTMs in my model does not have the return_sequences issue.
My project is a bit time-scarce at the moment, so if you think a quick fix would come before the snapshots I would be happy to use it.