suf
February 22, 2020, 7:40pm
1
I am facing an issue importing an LSTM model using Keras (Functional API) v2.3.1 with Tensorflow v2.1.0. I am using the configuration return_sequences=False for my LSTM layer. I am importing the model using:
ComputationGraph graph = KerasModelImport.importKerasModelAndWeights(model,true);
I get the following stack trace:
gistfile1.txt
Exception in thread "main" java.lang.IllegalStateException: Invalid input: expected input of type RNN, got InputTypeFeedForward(400)
at org.deeplearning4j.nn.conf.preprocessor.RnnToFeedForwardPreProcessor.getOutputType(RnnToFeedForwardPreProcessor.java:104)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:521)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:449)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1201)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraphConfiguration(KerasModel.java:394)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:415)
at org.deeplearning4j.nn.modelimport.keras.KerasModel.getComputationGraph(KerasModel.java:404)
I have seen in previous issues that this may have been resolved. Also in 1.0.0-alpha, this bug was marked as fixed.
How can I fix this issue?
I am using
Deeplearning4j 1.0.0-beta6
Platform information OSX
And my Pom is: POM: POM for Keras Model import · GitHub
I have also put this issue on Github but have not received a response yet.
treo
February 23, 2020, 10:33am
2
That does look suspicious. Can you share the exported model or the way you produced it, so we can take a closer look?
There are tests for the previous issues, so that shouldn’t be a regression.
suf
February 24, 2020, 9:37am
3
Thanks for looking into this.
Model creation
words_input = Input(shape=(None,), dtype='int32', name='words_input')
words = Embedding(input_dim=wordEmbeddings.shape[0], output_dim=wordEmbeddings.shape[1], weights=[wordEmbeddings],
trainable=False)(words_input)
words_trans = TimeDistributed(Dense(100))(words)
casing_input = Input(shape=(None, word_features_cnt,), name='casing_input')
casing_trans = TimeDistributed(Dense(100))(casing_input)
character_input = Input(shape=(None, 52,), name='char_input')
embed_char_out = TimeDistributed(
Embedding(len(char2Idx), 30, embeddings_initializer=RandomUniform(minval=-0.5, maxval=0.5)), name='char_embedding')(
character_input)
dropout = Dropout(0.5)(embed_char_out)
conv1d_out = TimeDistributed(Conv1D(kernel_size=3, filters=30, padding='same', activation='tanh', strides=1))(dropout)
maxpool_out = TimeDistributed(MaxPooling1D(52))(conv1d_out)
char = TimeDistributed(Flatten())(maxpool_out)
char = Dropout(0.5)(char)
output = concatenate([words_trans, casing_trans, char])
output = Bidirectional(LSTM(200, return_sequences=False, dropout=0.50, recurrent_dropout=0.25))(output) // This line is likely causing the issue
output = Dense(len(label2Idx), activation='softmax')(output)
model = Model(inputs=[words_input, casing_input, character_input], outputs=[output])
model.compile(loss='sparse_categorical_crossentropy', optimizer='nadam')
model.summary()
model.save("models/NER_dl4j_functional_many_to_one.hdf5")
Exporting the model
public class CModelImport {
public static ComputationGraph getKerasModel() throws InvalidKerasConfigurationException, IOException, UnsupportedKerasConfigurationException {
String simpleMlp = new ClassPathResource(
"NER_dl4j_functional_many_to_one.hdf5").getFile().getPath();
KerasModelBuilder builder = new KerasModel()
.modelBuilder()
.modelHdf5Filename(simpleMlp)
.enforceTrainingConfig(true);
KerasModel model = builder.buildModel();
ComputationGraph graph = model.getComputationGraph();
return graph;
}
}
Edit: Made the posted code more readable.
treo
February 24, 2020, 11:23am
4
@eraly Would you mind taking a closer look at this?
eraly
February 24, 2020, 5:51pm
6
@suf Can you add what wordEmbeddings is to the code so I can recreate a model on my end? Thanks!
suf
February 25, 2020, 9:38am
7
# :: Read in word embeddings ::
word2Idx = { }
wordEmbeddings = [ ]
fEmbeddings = open("embeddings/glove.6B.100d.txt", encoding="utf-8")
for line in fEmbeddings:
split = line.strip().split(" ")
word = split[0]
if len(word2Idx) == 0: # Add padding+unknown
word2Idx["PADDING_TOKEN"] = len(word2Idx)
vector = np.zeros(len(split) - 1) # Zero vector vor 'PADDING' word
wordEmbeddings.append(vector)
word2Idx["UNKNOWN_TOKEN"] = len(word2Idx)
vector = np.random.uniform(-0.25, 0.25, len(split) - 1)
wordEmbeddings.append(vector)
else:
vector = np.array([float(num) for num in split[1:]])
wordEmbeddings.append(vector)
word2Idx[split[0]] = len(word2Idx)
wordEmbeddings = np.array(wordEmbeddings)
For the sake of saving time reproducing the model, you could make a 2D NumPy array filled with zeros instead of reading in GloVe.
eraly
February 26, 2020, 3:27pm
8
@suf I need a python model to look into this.
I am happy to recreate the model on my end if you can give me python code which runs out of the box. The code above currently doesn’t. word_features_cnt isn’t defined. char2Idx isn’t either.
Or you could link to an uploaded h5 on google drive or here.
suf
February 26, 2020, 4:26pm
9
Please find the hdf5 file drive link attached
NER_dl4j_functional_many_to_one.hdf5
eraly
February 26, 2020, 6:00pm
11
I’ve requested access to download the link.
suf
February 26, 2020, 6:30pm
12
You should have access now.
1 Like
suf
February 27, 2020, 5:28pm
13
Hi, were you able to import this model?
eraly
February 27, 2020, 5:51pm
14
Hi, Yes, I was. It looks like you’ve hit a corner case with the bidirectional lstm import + return sequence true. Bit of a gnarly bug.
eraly
February 28, 2020, 12:59am
15
There is also an additional problem related to wrapping embedding models in time distributed.
suf
February 28, 2020, 10:37am
16
Does that mean the bug is fixed if you’re able to import it and create the ComputationGraph?
I noticed that this bug was highlighted as fixed with version 1.0.0-alpha:
https://deeplearning4j.org/release-notes#onezerozeroalpha-dl4jkeras
eraly
March 2, 2020, 5:35pm
17
It was I think a new feature in alpha. But like I said there was a bug with bidirectional LSTMs in the import. I was able to import the graph with my fix. But it err’d out on the forward pass because of the second bug I mentioned. I can spend some more time on it. Will keep you posted. Best case scenario - I have a quick fix and then you can build from source or wait till the fix is available on snapshots.
suf
March 3, 2020, 10:30am
18
Indeed, I found using vanilla LSTMs in my model does not have the return_sequences issue.
My project is a bit time-scarce at the moment, so if you think a quick fix would come before the snapshots I would be happy to use it.