Generic windowing functions for 2D images?

My question is virtually identical to this SO question but for DL4j. In short, can I define a sliding autoencoder/operator in DL4J? This could apply to the input layer or even the result of a Conv2d layer.

The proposed solution for Tensorflow was to use the extract_image_patches function. I see that is available for DL4J as well, so that is my fallback.

Is that the idiomatic way to simulate a windowing operation? Seems like it would have non-trivial performance implications; though I haven’t tested it to verify.

I am unsure if this or stackoverflow is the correct place to ask this. Sorry if this is the wrong place to post general API questions.

Usually this is the better place to ask questions to anything related to DL4J, as it allows for a better conversation than Stackoverflow.

Do I understand your question correctly, that you in principle want to apply a window over a picture, and use each possible position over the entire picture for unsupervised training?

There are a few ways I can think of doing that, but to suggest something useful, I need to know a little bit more:

  • Are you going to train just on top of a single picture?
  • If not, what kind of mini batch composition are you trying to get?
  • What are you trying to achieve with that overall? What is the final goal there?

Do I understand your question correctly, that you in principle want to apply a window over a picture, and use each possible position over the entire picture for unsupervised training?

Exactly. But I do want to apply this method over multiple layers. In principle, the input could be an image with N channels or the output of the previous layer with N features. The key is preserving locality.

I think you understand core of what I want to do. This is a purely an academic/learning experiment where I am trying to see if it’s possible to build a hierarchy of “grouped” features in order to discover structure. Something akin to capsule networks, but I want to use a number of alternative methods for defining how those capsules are expressed.

As for training, I expect there is some flexibility here but ultimately, each image would map to a tree of sparse features where each feature has a small “pose matrix” (if I were to use the capsule terminology). As for how many images to process in a batch; I don’t think I have any restrictions there other than performance.