I would appreciate any advice on a CNN architecture for non-square input.
Typically, I’m trying to adapt ResNet50 (coded for 224x224x3 input) to work on smaller and non-square gray images (112x56x1 would be nice).
My inputs are thus 24 times smaller than the original ResNet50 inputs.
I understand the convolution itself can cope with my specific input dimensions. The problem is with the following layers (pooling for example), for which there seems to be only the nOut() method to size the output. This is OK for a square but not for a rectangle, for which width and height values are different.
So, is there any way in Deeplearning4J configurations to explicitly provide layer width and height output values rather than a single nOut value?
I’m still searching, but I did not find any in the zoo models.
The width dimension of the input for the 77th layer is smaller than the kernel for the convultion layer. You would have to change the kernel size to 4, 2 (or 2, 4. I’m not sure whether width or height is the first argument) in order to match the input.
It took me some time to understand that shrinking the input image size implied to also modify the network architecture. The reason is the combined size reductions performed one after the other by all the convolutional layers.
So, I discarded ResNet50 for something much smaller: ResNet18, as least as a starting point. And it works.