Failed to execute op conv3dnew in M2

Hi there, I tried to build DL4J in a Linux OS, with gcc7.0, I saw the following error message. I think I may have some incompatibility issues but couldn’t figure that out. Does anyone have idea that how to fix this? Thanks.

14:07:23 2022-08-22 19:07:15 ERROR org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner:1959 - Failed to execute op conv3dnew. Attempted to execute with 3 inputs, 1 outputs, 0 targs,0 bargs and 14 iargs. Inputs: [(FLOAT,[1,128,128,128,48],c), (FLOAT,[3,3,3,48,16],c), (FLOAT,[1,16],f)]. Outputs: [(FLOAT,[1,128,128,128,16],c)]. tArgs: -. iArgs: [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]. bArgs: -. Op own name: “bebf562e-71d1-459c-bd27-7f6baeafa585” - Please see above message (printed out from c++) for a possible cause of error.

14:07:23 java.lang.RuntimeException: Op [conv3dnew] execution failed 14:07:23 at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1561) 14:07:23 at org.deeplearning4j.nn.layers.convolution.Convolution3DLayer.preOutput(Convolution3DLayer.java:275) 14:07:23 at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:509) 14:07:23 at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:110) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2442) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1744) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1700) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1630) 1

pom.xml
<dl4j.version>1.0.0-M2</dl4j.version>

org.deeplearning4j deeplearning4j-core ${dl4j.version} org.datavec datavec-data-image org.nd4j nd4j-native-platform ${dl4j.version} org.datavec datavec-api ${dl4j.version} org.nd4j nd4j-native ${dl4j.version} linux-x86_64-compat

Can you also share a few of the lines above that error? Usually it does tell you what is wrong.

Hi Treo,

Ah I think my work node does not have enough memory to build and run, I ran it at another instance with large memory then it went through.

I found it requires ~10GB memory to allocate all necessary parameters, is it true or I may misconfigure something else?

Thanks.

That may happen if you have particularly large inputs, large batch sizes, or a big model.

Don’t forget that you need memory to not only hold your parameters, but also all intermediary state, like the results of operations that will receive parameter updates through back-propagation, as well as the updater state.

The updater state for the Adam optimizer will take about 3 times as much memory as your parameters alone. And depending on the model configuration and input size, that too will be dwarfed by the intermediary state that needs to be retained to calculate the gradient for your model.

1 Like

Thank you for your quick reply. I will double-check the size.