Hi,
We are using Deeplearning4j for making predictions where model is trained by Keras and imported to Deeplearning4j. Version is 1.0.0-beta6 [1]. We are using CPU backend, our CPU supports AVX2 and AVX512 instructions [2].
Our code looks similar to this:
private static final int BATCH_SIZE = 4096;
private static final int INPUT_SIZE = 512;
private static final int [] SHAPE = { BATCH_SIZE, 1, INPUT_SIZE };
private void predict(ComputationGraph graph, float[] input1, float[] input2) {
try(INDArray firstInput = Nd4j.create(input1, SHAPE);
INDArray secondInput = Nd4j.create(input2, SHAPE)) {
INDArray result = graph.outputSingle(firstInput, secondInput);
process(result);
result.close();
}
}
After some profiling and logging, we saw than Nd4j.create part is taking a lot longer than graph.outputSingle. Therefore, I thought we must be doing something wrong.
float arrays (input1, input2) are re-used, meaning we alllocate them once in application’s lifetime and fill data (override) in them for each batch.
So, is the recommended way of feeding data to graph for prediction and INDArray creation is to create them every time needed as we did in the code above? Or can we re-use them since the underlying arrays are already re-used? Or what is the best and most efficient way for this?
Could you please help and guide us in this issue?
[1] 1.0.0-beta7 gave an error during prections for imported model, therefore we delayed migration.
[2] We have declared nd4j-native avx2 and avx512 dependendencies in our pom.xml
Thanks in advance.