Advice on non-square input


I would appreciate any advice on a CNN architecture for non-square input.

Typically, I’m trying to adapt ResNet50 (coded for 224x224x3 input) to work on smaller and non-square gray images (112x56x1 would be nice).
My inputs are thus 24 times smaller than the original ResNet50 inputs.

I understand the convolution itself can cope with my specific input dimensions. The problem is with the following layers (pooling for example), for which there seems to be only the nOut() method to size the output. This is OK for a square but not for a rectangle, for which width and height values are different.

So, is there any way in Deeplearning4J configurations to explicitly provide layer width and height output values rather than a single nOut value?
I’m still searching, but I did not find any in the zoo models.

Thanks in advance for any advice,

The nOut values on CNN layers in DL4J set the number of output channels, so they should be able to cope with non-sequare inputs anyway.

I guess you tried something and did get an error? Can you share the full stack trace?

Here is my attempt. I basically copied ResNet50 source code into a similar ResNet50Custtomized class and replaced the line:

private int inputShape = new int{3, 224, 224};

by the following line:

private int inputShape = new int{1, 56, 112};

Then at run time, I got this output:

Exception in thread “main” org.deeplearning4j.exception.DL4JInvalidConfigException: Invalid configuration for layer (idx=77, name=res4a_branch2b, type=ConvolutionLayer) for width dimension: Invalid input configuration for kernel width. Require 0 < kW <= inWidth + 2*padW; got (kW=3, inWidth=2, padW=0)
Input type = InputTypeConvolutional(h=4,w=2,c=256), kernel = [3, 3], strides = [1, 1], padding = [0, 0], layer size (output channels) = 256, convolution mode = Same
at org.deeplearning4j.nn.conf.layers.InputTypeUtil.getOutputTypeCnnLayers(
at org.deeplearning4j.nn.conf.layers.ConvolutionLayer.getOutputType(
at org.deeplearning4j.nn.conf.graph.LayerVertex.getOutputType(
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$
at org.audiveris.omrdataset.train.ResNet50Customized.init(
at org.audiveris.omrdataset.train.Training.process(
at org.audiveris.omrdataset.Main.main(

Thanks for your help,

The width dimension of the input for the 77th layer is smaller than the kernel for the convultion layer. You would have to change the kernel size to 4, 2 (or 2, 4. I’m not sure whether width or height is the first argument) in order to match the input.

Thanks for your remarks.

It took me some time to understand that shrinking the input image size implied to also modify the network architecture. The reason is the combined size reductions performed one after the other by all the convolutional layers.

So, I discarded ResNet50 for something much smaller: ResNet18, as least as a starting point. And it works.

Thank you all,