Most efficient way to do inference?

Are there any recommendations for best practices for using dl4j for large scale inference jobs? I see a handful of examples for training, but only a trivial example for inference.

I have a function that basically generates an iterator over image chips that I want to apply inference to, but I don’t know what the best practices are for turning that into something that my ComputationGraph can process.


The most efficient way to run inference always depends on how your data arrives. If you already have all of it on your local machine, it is pretty easy: Just run it with the largest batch sizes that your hardware can handle. That usually results in optimal utilization with single socket systems. Note, that if you have hyper threading, it may not use all available logical cores, as doing so might result in actually slower execution.

So currently, my data is being loaded into a single java queue via multiple threads. Currently, we have a line along the lines of y = classifierModel.output(false, x).

I see the other signatures for output, and I’m wondering if some of these options allow me to do things like put my data directly into pinned memory, or move the data to the gpu asynchronously.

So you already have a preloading implemented, and have a queue of single inputs, right?
Take as many of those as your hardware can handle, and stack them on top of each other. So you get an input that is like a mini-batch during training. Let your model work on that.

As you can see in DL4J Classification Speed @torstenbm was able to achieve over 200k classifications per second by batching his inputs.

If you meant the output(DataSetIterator) signature: This more or less just saves you the loop of iterating manually through your iterator at the moment.

But you can still get something close to what you were talking about with Workspaces.
Take a look at and and the examples for them:

1 Like

I’ll take a look. Thanks!