Unable to import Many-to-one LSTM Keras model

suf · February 22, 2020, 7:40pm

I am facing an issue importing an LSTM model using Keras (Functional API) v2.3.1 with Tensorflow v2.1.0. I am using the configuration return_sequences=False for my LSTM layer. I am importing the model using:

ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(model,true);

I get the following stack trace:

gist.github.com

https://gist.github.com/YusufuShehu/19cf57ccc908ccd69e6969d9e471cfc4

gistfile1.txt

Exception in thread "main" java.lang.IllegalStateException: Invalid input: expected input of type RNN, got InputTypeFeedForward(400)
	at org.deeplearning4j.nn.conf.preprocessor.RnnToFeedForwardPreProcessor.getOutputType(RnnToFeedForwardPreProcessor.java:104)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:521)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:449)
	at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1201)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraphConfiguration(KerasModel.java:394)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:415)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:404)

I have seen in previous issues that this may have been resolved. Also in 1.0.0-alpha, this bug was marked as fixed.

How can I fix this issue?

I am using

Deeplearning4j 1.0.0-beta6
Platform information OSX

And my Pom is: POM: POM for Keras Model import · GitHub

I have also put this issue on Github but have not received a response yet.

treo · February 23, 2020, 10:33am

That does look suspicious. Can you share the exported model or the way you produced it, so we can take a closer look?

There are tests for the previous issues, so that shouldn’t be a regression.

suf · February 24, 2020, 9:37am

Thanks for looking into this.

Model creation

words_input = Input(shape=(None,), dtype='int32', name='words_input')
words = Embedding(input_dim=wordEmbeddings.shape[0], output_dim=wordEmbeddings.shape[1], weights=[wordEmbeddings],
                  trainable=False)(words_input)


words_trans = TimeDistributed(Dense(100))(words)

casing_input = Input(shape=(None, word_features_cnt,), name='casing_input')

casing_trans = TimeDistributed(Dense(100))(casing_input)

character_input = Input(shape=(None, 52,), name='char_input')

embed_char_out = TimeDistributed(
    Embedding(len(char2Idx), 30, embeddings_initializer=RandomUniform(minval=-0.5, maxval=0.5)), name='char_embedding')(
    character_input)

dropout = Dropout(0.5)(embed_char_out)

conv1d_out = TimeDistributed(Conv1D(kernel_size=3, filters=30, padding='same', activation='tanh', strides=1))(dropout)
maxpool_out = TimeDistributed(MaxPooling1D(52))(conv1d_out)
char = TimeDistributed(Flatten())(maxpool_out)

char = Dropout(0.5)(char)
output = concatenate([words_trans, casing_trans, char])

output = Bidirectional(LSTM(200, return_sequences=False, dropout=0.50, recurrent_dropout=0.25))(output) // This line is likely causing the issue
output = Dense(len(label2Idx), activation='softmax')(output)
model = Model(inputs=[words_input, casing_input, character_input], outputs=[output])
model.compile(loss='sparse_categorical_crossentropy', optimizer='nadam')
model.summary()
model.save("models/NER_dl4j_functional_many_to_one.hdf5")

Exporting the model

public class CModelImport {

    public static ComputationGraph getKerasModel() throws InvalidKerasConfigurationException, IOException, UnsupportedKerasConfigurationException {
        String simpleMlp = new ClassPathResource(
                "NER_dl4j_functional_many_to_one.hdf5").getFile().getPath();

        KerasModelBuilder builder = new KerasModel()
                .modelBuilder()
                .modelHdf5Filename(simpleMlp)
                .enforceTrainingConfig(true);

        KerasModel model = builder.buildModel();
        ComputationGraph graph = model.getComputationGraph();
        return graph;
    }
}

Edit: Made the posted code more readable.

treo · February 24, 2020, 11:23am

@eraly Would you mind taking a closer look at this?

eraly · February 24, 2020, 5:35pm

Of course

eraly · February 24, 2020, 5:51pm

@suf Can you add what wordEmbeddings is to the code so I can recreate a model on my end? Thanks!

suf · February 25, 2020, 9:38am

# :: Read in word embeddings ::
word2Idx = { }
wordEmbeddings = [ ]

fEmbeddings = open("embeddings/glove.6B.100d.txt", encoding="utf-8")

for line in fEmbeddings:
    split = line.strip().split(" ")
    word = split[0]

    if len(word2Idx) == 0:  # Add padding+unknown
        word2Idx["PADDING_TOKEN"] = len(word2Idx)
        vector = np.zeros(len(split) - 1)  # Zero vector vor 'PADDING' word
        wordEmbeddings.append(vector)

        word2Idx["UNKNOWN_TOKEN"] = len(word2Idx)
        vector = np.random.uniform(-0.25, 0.25, len(split) - 1)
        wordEmbeddings.append(vector)

    else:
        vector = np.array([float(num) for num in split[1:]])
        wordEmbeddings.append(vector)
        word2Idx[split[0]] = len(word2Idx)

wordEmbeddings = np.array(wordEmbeddings)

For the sake of saving time reproducing the model, you could make a 2D NumPy array filled with zeros instead of reading in GloVe.

eraly · February 26, 2020, 3:27pm

@suf I need a python model to look into this.
I am happy to recreate the model on my end if you can give me python code which runs out of the box. The code above currently doesn’t. word_features_cnt isn’t defined. char2Idx isn’t either.
Or you could link to an uploaded h5 on google drive or here.

suf · February 26, 2020, 4:26pm

Please find the hdf5 file drive link attached
NER_dl4j_functional_many_to_one.hdf5

eraly · February 26, 2020, 5:59pm

Wonderful. Thank you.

eraly · February 26, 2020, 6:00pm

I’ve requested access to download the link.

suf · February 26, 2020, 6:30pm

You should have access now.

suf · February 27, 2020, 5:28pm

Hi, were you able to import this model?

eraly · February 27, 2020, 5:51pm

Hi, Yes, I was. It looks like you’ve hit a corner case with the bidirectional lstm import + return sequence true. Bit of a gnarly bug.

eraly · February 28, 2020, 12:59am

There is also an additional problem related to wrapping embedding models in time distributed.

suf · February 28, 2020, 10:37am

Does that mean the bug is fixed if you’re able to import it and create the ComputationGraph?

I noticed that this bug was highlighted as fixed with version 1.0.0-alpha:

https://deeplearning4j.org/release-notes#onezerozeroalpha-dl4jkeras

eraly · March 2, 2020, 5:35pm

It was I think a new feature in alpha. But like I said there was a bug with bidirectional LSTMs in the import. I was able to import the graph with my fix. But it err’d out on the forward pass because of the second bug I mentioned. I can spend some more time on it. Will keep you posted. Best case scenario - I have a quick fix and then you can build from source or wait till the fix is available on snapshots.

suf · March 3, 2020, 10:30am

Indeed, I found using vanilla LSTMs in my model does not have the return_sequences issue.
My project is a bit time-scarce at the moment, so if you think a quick fix would come before the snapshots I would be happy to use it.

Topic		Replies	Views
Problem with imported Keras LSTM model DL4J	1	626	January 30, 2020
Keras Model import error with TimeDistributed layer DL4J	1	749	July 29, 2020
Imported Keras LSTM layer mismatch DL4J	18	1521	February 14, 2020
Keras import : ClassCastException between layers.recurrent.LastTimeStep and layers.FeedForwardLayer DL4J	0	411	May 6, 2020
Keras import model gave different prediction results DL4J	1	460	February 26, 2020

Unable to import Many-to-one LSTM Keras model

Related topics