Recommended way to create INDArray for prediction?

kullanici0606 · May 28, 2020, 9:56am

Hi,

We are using Deeplearning4j for making predictions where model is trained by Keras and imported to Deeplearning4j. Version is 1.0.0-beta6 [1]. We are using CPU backend, our CPU supports AVX2 and AVX512 instructions [2].

Our code looks similar to this:

private static final int BATCH_SIZE = 4096;
private static final int INPUT_SIZE = 512;
private static final int [] SHAPE = { BATCH_SIZE, 1, INPUT_SIZE };

private void predict(ComputationGraph graph, float[] input1, float[] input2) {
    try(INDArray firstInput = Nd4j.create(input1, SHAPE);
        INDArray secondInput = Nd4j.create(input2, SHAPE)) {
        INDArray result = graph.outputSingle(firstInput, secondInput);
        process(result);
        result.close();
    }
}

After some profiling and logging, we saw than Nd4j.create part is taking a lot longer than graph.outputSingle. Therefore, I thought we must be doing something wrong.

float arrays (input1, input2) are re-used, meaning we alllocate them once in application’s lifetime and fill data (override) in them for each batch.

So, is the recommended way of feeding data to graph for prediction and INDArray creation is to create them every time needed as we did in the code above? Or can we re-use them since the underlying arrays are already re-used? Or what is the best and most efficient way for this?

Could you please help and guide us in this issue?

[1] 1.0.0-beta7 gave an error during prections for imported model, therefore we delayed migration.
[2] We have declared nd4j-native avx2 and avx512 dependendencies in our pom.xml

Thanks in advance.

treo · May 28, 2020, 10:31am

How exactly are you measuring the time? The first Nd4j call takes some time to initialize the whole system, so on cold starts, what ever you call will have the initialization overhead.

Other than that, you will very likely benefit from using workspaces, as that allows the system to reuse the memory for the both input arrays instead of having a constant alloc/dealloc going on.

For examples how to use workspaces see https://github.com/eclipse/deeplearning4j-examples/blob/master/nd4j-examples/src/main/java/org/nd4j/examples/Nd4jEx15_Workspaces.java

kullanici0606 · May 28, 2020, 11:37am

You are right, after double-checking and running app longer, I realized that I must have been measuring time incorrectly. Most probably, as you said, I saw first initialization time. After running some time, there is no noticable overhead of nd4j call compared to prediction.

Thanks a lot for helping and also pointing workspaces out. I will look at workspaces, it looks like it is going to help us.

kullanici0606 · May 28, 2020, 12:55pm

When using workspaces, should I close created INDArray or not? In the examples, they are not closed.

treo · May 29, 2020, 6:49am

You don’t shouldn’t need to close them.

kullanici0606 · May 29, 2020, 7:22am

Thank you very much, you helped a lot in this topic.

Topic		Replies	Views
Create nd4j array backed by nio ByteBuffer ND4J	5	344	August 6, 2021
Convert INDArray to double[] take a lot of time DL4J	3	1126	July 3, 2020
Basic deeplearning4j classification example DL4J	4	1000	February 3, 2020
OutOfMemoryError ND4J	10	592	May 4, 2020
How to feed a network with data from GPU ND4J	3	519	March 9, 2021

Recommended way to create INDArray for prediction?

Related topics