Loading pre-trained model failed if add para "dilation_rate=2" in a Conv3D layer

Trying to load a pre-trained Keras functional model by DL4J in Java, if my pre-trained model contains a Conv3D layer with “dilation_rate =2”, it will fail. If “dilation_rate=1”, DL4J will load the model successfully.

conv4_d2 = Conv3D(start_neuron*8, (3,3,3), activation = 'relu', padding = 'same', name='conv4_d2_1', dilation_rate=2)(conv4)

Not sure what cause this, hope I can find the answer here. Thanks.

java.lang.RuntimeException: Op [conv3dnew] execution failed
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1594)
	at org.deeplearning4j.nn.layers.convolution.Convolution3DLayer.preOutput(Convolution3DLayer.java:276)
	at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:489)
	at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
	at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2380)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1741)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1697)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1627)
Caused by: java.lang.RuntimeException: could not create a descriptor for a dilated convolution forward propagation primitive
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1924)
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1573)
	... 16 more

@FanDev could you clarify the version and make it possible for me to run this somehow? Any model I could use would be great.

Yes, I am using

<dl4j.version>1.0.0-beta7</dl4j.version>

in a POM file.

I used modelLoadWeights = KerasModelImport.importKerasModelAndWeights(model_json, model_h5); to load this model in dl4j.

Here is the network (a basic UNET model):

def model_test(start_neuron=16, input_size = (128,128,128,1), InputName = 'TEST', DropoutRatio= 0.5):
   
    inputs = Input(input_size, name = InputName)
   
    conv1 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv1_1')(inputs)
    conv1 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv1_2')(conv1)
    pool1 = MaxPooling3D(pool_size=(2, 2, 2) ,name='pool1')(conv1)
   
    conv2 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv2_1')(pool1)
    conv2 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv2_2')(conv2)
    pool2 = MaxPooling3D(pool_size=(2, 2, 2) ,name='pool2')(conv2)
   
    conv3 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv3_1')(pool2)
    conv3 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv3_2')(conv3)
    pool3 = MaxPooling3D(pool_size=(2, 2, 2) ,name='pool3')(conv3)
   
    conv4 = Conv3D(start_neuron*8, (3,3,3), activation = 'relu', padding = 'same' ,name='conv4_1', dilation_rate=2)(pool3)
    conv4 = Conv3D(start_neuron*8, (3,3,3), activation = 'relu', padding = 'same' ,name='conv4_2', dilation_rate=2)(conv4)
   
    upsample1 = UpSampling3D(size=2)(conv4)
    merge5 = Concatenate(axis = 4)([upsample1,conv3])
    conv5 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv5_1')(merge5)
    conv5 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv5_2')(conv5)

    upsample2 = UpSampling3D(size=2)(conv5)
    merge6 = Concatenate(axis = 4)([upsample2,conv2])
    conv6 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv6_1')(merge6)
    conv6 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv6_2')(conv6)
 
    upsample3 = UpSampling3D(size=2)(conv6)
    merge7 = Concatenate(axis = 4)([upsample3,conv1])
    conv7 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv7_1')(merge7)
    conv7 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv7_2')(conv7)

    conv8 = Conv3D(1, (1,1,1), activation = 'sigmoid' ,name='conv8')(conv7)

    model = Model(inputs, conv8)
    return model

As long as no dilation_rate=2 or setting dilation_rate=1, the model can be loaded successfully and runs well.

h5 file
json file
Thanks.

@FanDev thanks. Have you tried 1.0.0-M1.1 or snapshots? beta7 is pretty old and keras import has received a lot of improvements since then.

Hi, thanks for the quick reply. I re-build using 1.0.0-M1.1, but see the following errors:

org.deeplearning4j.nn.conf.inputs.InvalidInputTypeException: Invalid input: MergeVertex cannot merge CNN3D activations of different width/heights:first [channels,width,height] = [2,32,32], input 1 = [1,32,32]
	at org.deeplearning4j.nn.conf.graph.MergeVertex.getOutputType(MergeVertex.java:127)
	at org.deeplearning4j.nn.modelimport.keras.layers.core.KerasMerge.getOutputType(KerasMerge.java:163)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.inferOutputTypes(KerasModel.java:473)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.<init>(KerasModel.java:186)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.<init>(KerasModel.java:99)
	at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildModel(KerasModelBuilder.java:311)
	at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:257)

Then I tried to use another model (which can be loaded successfully in 1.0.0-beta7, no “dilation_rate” parameter), but could not load it in 1.0.0-M1.1 with the same error above.

Thanks.

@FanDev thanks for reporting. I’m kind of wondering if that being allowed to import was actually a bug. Could you give me a main class with inputs I could run for comparison to see if I can compare them?
If there is some unintended behavior I can get that fixed in the process.Thanks!

Thanks, please check PM.

@FanDev I found out the problem is actually the concatenate layer. It does something unexpected and actually merges 2 inputs of different shapes: None,32,32,32,64 and None,32,32,128
This sums to 192 with a merge axis of 4. Our Merge layer’s validation assumes that all dimensions are the same. I removed the validation and it managed to import the model somewhat but now appears to have issues with another layer’s weights. I’ll work through that and let you know when a PR is up.

Thanks. Look forward to hearing from you.

@FanDev PR here: https://github.com/eclipse/deeplearning4j/pull/9578 if you want the converted model or any other logistics feel free to DM me.

Thank you, will start to test and let you know.
Cheers!