Generic windowing functions for 2D images?

aewhite · July 6, 2020, 1:24am

My question is virtually identical to this SO question but for DL4j. In short, can I define a sliding autoencoder/operator in DL4J? This could apply to the input layer or even the result of a Conv2d layer.

The proposed solution for Tensorflow was to use the extract_image_patches function. I see that is available for DL4J as well, so that is my fallback.

Is that the idiomatic way to simulate a windowing operation? Seems like it would have non-trivial performance implications; though I haven’t tested it to verify.

I am unsure if this or stackoverflow is the correct place to ask this. Sorry if this is the wrong place to post general API questions.

treo · July 8, 2020, 8:08am

Usually this is the better place to ask questions to anything related to DL4J, as it allows for a better conversation than Stackoverflow.

Do I understand your question correctly, that you in principle want to apply a window over a picture, and use each possible position over the entire picture for unsupervised training?

There are a few ways I can think of doing that, but to suggest something useful, I need to know a little bit more:

Are you going to train just on top of a single picture?
If not, what kind of mini batch composition are you trying to get?
What are you trying to achieve with that overall? What is the final goal there?

aewhite · July 8, 2020, 3:13pm

Do I understand your question correctly, that you in principle want to apply a window over a picture, and use each possible position over the entire picture for unsupervised training?

Exactly. But I do want to apply this method over multiple layers. In principle, the input could be an image with N channels or the output of the previous layer with N features. The key is preserving locality.

I think you understand core of what I want to do. This is a purely an academic/learning experiment where I am trying to see if it’s possible to build a hierarchy of “grouped” features in order to discover structure. Something akin to capsule networks, but I want to use a number of alternative methods for defining how those capsules are expressed.

As for training, I expect there is some flexibility here but ultimately, each image would map to a tree of sparse features where each feature has a small “pose matrix” (if I were to use the capsule terminology). As for how many images to process in a batch; I don’t think I have any restrictions there other than performance.

Topic		Replies	Views
Are there any examples of DL4J's 3D image parsing?	3	503	December 11, 2020
Issue in getting 2d DataSet DL4J	3	738	June 15, 2020
Squeeze-and-Excitation Network example (SENet / SE-ResNet) DL4J	0	722	February 19, 2020
Face Detection and Comparison in DL4J DL4J	1	792	June 28, 2021
Autoencode sanity check DL4J	3	507	January 29, 2023

Generic windowing functions for 2D images?

Related topics