Advice on non-square input

hbitteur · March 15, 2020, 5:25pm

Bonjour,

I would appreciate any advice on a CNN architecture for non-square input.

Typically, I’m trying to adapt ResNet50 (coded for 224x224x3 input) to work on smaller and non-square gray images (112x56x1 would be nice).
My inputs are thus 24 times smaller than the original ResNet50 inputs.

I understand the convolution itself can cope with my specific input dimensions. The problem is with the following layers (pooling for example), for which there seems to be only the nOut() method to size the output. This is OK for a square but not for a rectangle, for which width and height values are different.

So, is there any way in Deeplearning4J configurations to explicitly provide layer width and height output values rather than a single nOut value?
I’m still searching, but I did not find any in the zoo models.

Thanks in advance for any advice,
/Hervé

treo · March 16, 2020, 6:47am

The nOut values on CNN layers in DL4J set the number of output channels, so they should be able to cope with non-sequare inputs anyway.

I guess you tried something and did get an error? Can you share the full stack trace?

hbitteur · March 16, 2020, 5:05pm

Here is my attempt. I basically copied ResNet50 source code into a similar ResNet50Custtomized class and replaced the line:

private int inputShape = new int{3, 224, 224};

by the following line:

private int inputShape = new int{1, 56, 112};

Then at run time, I got this output:

Exception in thread “main” org.deeplearning4j.exception.DL4JInvalidConfigException: Invalid configuration for layer (idx=77, name=res4a_branch2b, type=ConvolutionLayer) for width dimension: Invalid input configuration for kernel width. Require 0 < kW <= inWidth + 2*padW; got (kW=3, inWidth=2, padW=0)
Input type = InputTypeConvolutional(h=4,w=2,c=256), kernel = [3, 3], strides = [1, 1], padding = [0, 0], layer size (output channels) = 256, convolution mode = Same
at org.deeplearning4j.nn.conf.layers.InputTypeUtil.getOutputTypeCnnLayers(InputTypeUtil.java:329)
at org.deeplearning4j.nn.conf.layers.ConvolutionLayer.getOutputType(ConvolutionLayer.java:192)
at org.deeplearning4j.nn.conf.graph.LayerVertex.getOutputType(LayerVertex.java:131)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:536)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:449)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1201)
at org.audiveris.omrdataset.train.ResNet50Customized.init(ResNet50Customized.java:108)
at org.audiveris.omrdataset.train.Training.process(Training.java:241)
at org.audiveris.omrdataset.Main.main(Main.java:83)

Thanks for your help,
/Hervé

GPSforLEGENDS · March 17, 2020, 3:19pm

The width dimension of the input for the 77th layer is smaller than the kernel for the convultion layer. You would have to change the kernel size to 4, 2 (or 2, 4. I’m not sure whether width or height is the first argument) in order to match the input.

hbitteur · March 22, 2020, 9:18am

Thanks for your remarks.

It took me some time to understand that shrinking the input image size implied to also modify the network architecture. The reason is the combined size reductions performed one after the other by all the convolutional layers.

So, I discarded ResNet50 for something much smaller: ResNet18, as least as a starting point. And it works.

Thank you all,
/Hervé

Topic		Replies	Views
Error input size in multilayer conf DL4J	8	1363	April 25, 2020
CNN Multi feature Exception DL4J	3	475	March 11, 2021
Error while running ResNet50 Zoomodel example DL4J	7	1000	January 26, 2021
Simple CNN predicts NaNs DL4J	3	503	November 2, 2020
Reshape/Flatten layer support? DL4J	1	164	February 28, 2024

Advice on non-square input

Related topics