Loading pre-trained model failed if add para "dilation_rate=2" in a Conv3D layer

FanDev · December 12, 2021, 5:29am

Trying to load a pre-trained Keras functional model by DL4J in Java, if my pre-trained model contains a Conv3D layer with “dilation_rate =2”, it will fail. If “dilation_rate=1”, DL4J will load the model successfully.

conv4_d2 = Conv3D(start_neuron*8, (3,3,3), activation = 'relu', padding = 'same', name='conv4_d2_1', dilation_rate=2)(conv4)

Not sure what cause this, hope I can find the answer here. Thanks.

java.lang.RuntimeException: Op [conv3dnew] execution failed
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1594)
	at org.deeplearning4j.nn.layers.convolution.Convolution3DLayer.preOutput(Convolution3DLayer.java:276)
	at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:489)
	at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:111)
	at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2380)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1741)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1697)
	at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1627)

Caused by: java.lang.RuntimeException: could not create a descriptor for a dilated convolution forward propagation primitive
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1924)
	at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1573)
	... 16 more

agibsonccc · December 12, 2021, 10:00am

@FanDev could you clarify the version and make it possible for me to run this somehow? Any model I could use would be great.

FanDev · December 12, 2021, 3:39pm

Yes, I am using

<dl4j.version>1.0.0-beta7</dl4j.version>

in a POM file.

I used modelLoadWeights = KerasModelImport.importKerasModelAndWeights(model_json, model_h5); to load this model in dl4j.

Here is the network (a basic UNET model):

def model_test(start_neuron=16, input_size = (128,128,128,1), InputName = 'TEST', DropoutRatio= 0.5):
   
    inputs = Input(input_size, name = InputName)
   
    conv1 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv1_1')(inputs)
    conv1 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv1_2')(conv1)
    pool1 = MaxPooling3D(pool_size=(2, 2, 2) ,name='pool1')(conv1)
   
    conv2 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv2_1')(pool1)
    conv2 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv2_2')(conv2)
    pool2 = MaxPooling3D(pool_size=(2, 2, 2) ,name='pool2')(conv2)
   
    conv3 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv3_1')(pool2)
    conv3 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv3_2')(conv3)
    pool3 = MaxPooling3D(pool_size=(2, 2, 2) ,name='pool3')(conv3)
   
    conv4 = Conv3D(start_neuron*8, (3,3,3), activation = 'relu', padding = 'same' ,name='conv4_1', dilation_rate=2)(pool3)
    conv4 = Conv3D(start_neuron*8, (3,3,3), activation = 'relu', padding = 'same' ,name='conv4_2', dilation_rate=2)(conv4)
   
    upsample1 = UpSampling3D(size=2)(conv4)
    merge5 = Concatenate(axis = 4)([upsample1,conv3])
    conv5 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv5_1')(merge5)
    conv5 = Conv3D(start_neuron*4, (3,3,3), activation = 'relu', padding = 'same' ,name='conv5_2')(conv5)

    upsample2 = UpSampling3D(size=2)(conv5)
    merge6 = Concatenate(axis = 4)([upsample2,conv2])
    conv6 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv6_1')(merge6)
    conv6 = Conv3D(start_neuron*2, (3,3,3), activation = 'relu', padding = 'same' ,name='conv6_2')(conv6)
 
    upsample3 = UpSampling3D(size=2)(conv6)
    merge7 = Concatenate(axis = 4)([upsample3,conv1])
    conv7 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv7_1')(merge7)
    conv7 = Conv3D(start_neuron*1, (3,3,3), activation = 'relu', padding = 'same' ,name='conv7_2')(conv7)

    conv8 = Conv3D(1, (1,1,1), activation = 'sigmoid' ,name='conv8')(conv7)

    model = Model(inputs, conv8)
    return model

As long as no dilation_rate=2 or setting dilation_rate=1, the model can be loaded successfully and runs well.

h5 file
json file
Thanks.

agibsonccc · December 13, 2021, 11:44am

@FanDev thanks. Have you tried 1.0.0-M1.1 or snapshots? beta7 is pretty old and keras import has received a lot of improvements since then.

FanDev · December 13, 2021, 2:17pm

Hi, thanks for the quick reply. I re-build using 1.0.0-M1.1, but see the following errors:

org.deeplearning4j.nn.conf.inputs.InvalidInputTypeException: Invalid input: MergeVertex cannot merge CNN3D activations of different width/heights:first [channels,width,height] = [2,32,32], input 1 = [1,32,32]
	at org.deeplearning4j.nn.conf.graph.MergeVertex.getOutputType(MergeVertex.java:127)
	at org.deeplearning4j.nn.modelimport.keras.layers.core.KerasMerge.getOutputType(KerasMerge.java:163)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.inferOutputTypes(KerasModel.java:473)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.<init>(KerasModel.java:186)
	at org.deeplearning4j.nn.modelimport.keras.KerasModel.<init>(KerasModel.java:99)
	at org.deeplearning4j.nn.modelimport.keras.utils.KerasModelBuilder.buildModel(KerasModelBuilder.java:311)
	at org.deeplearning4j.nn.modelimport.keras.KerasModelImport.importKerasModelAndWeights(KerasModelImport.java:257)

Then I tried to use another model (which can be loaded successfully in 1.0.0-beta7, no “dilation_rate” parameter), but could not load it in 1.0.0-M1.1 with the same error above.

Thanks.

agibsonccc · December 13, 2021, 2:32pm

@FanDev thanks for reporting. I’m kind of wondering if that being allowed to import was actually a bug. Could you give me a main class with inputs I could run for comparison to see if I can compare them?
If there is some unintended behavior I can get that fixed in the process.Thanks!

FanDev · December 13, 2021, 10:22pm

Thanks, please check PM.

agibsonccc · December 14, 2021, 2:31am

@FanDev I found out the problem is actually the concatenate layer. It does something unexpected and actually merges 2 inputs of different shapes: None,32,32,32,64 and None,32,32,128
This sums to 192 with a merge axis of 4. Our Merge layer’s validation assumes that all dimensions are the same. I removed the validation and it managed to import the model somewhat but now appears to have issues with another layer’s weights. I’ll work through that and let you know when a PR is up.

FanDev · December 14, 2021, 3:26am

Thanks. Look forward to hearing from you.

agibsonccc · December 14, 2021, 11:49pm

@FanDev PR here: https://github.com/eclipse/deeplearning4j/pull/9578 if you want the converted model or any other logistics feel free to DM me.

FanDev · December 15, 2021, 3:01pm

Thank you, will start to test and let you know.
Cheers!

Topic		Replies	Views
Keras functional model with custom layer import error DL4J	12	299	June 9, 2023
Could not create a descriptor for a dilated convolution forward propagation primitive DL4J	6	2479	August 2, 2020
Unable to import keras model DL4J	5	446	February 27, 2021
Cannot load Keras functional model DL4J	2	1779	July 17, 2020
Unsupported keras layer type UpSampling3D DL4J	9	1262	June 15, 2020

Loading pre-trained model failed if add para "dilation_rate=2" in a Conv3D layer

Related topics