Failed to execute op conv3dnew in M2

FanDev · August 23, 2022, 4:47am

Hi there, I tried to build DL4J in a Linux OS, with gcc7.0, I saw the following error message. I think I may have some incompatibility issues but couldn’t figure that out. Does anyone have idea that how to fix this? Thanks.

14:07:23 2022-08-22 19:07:15 ERROR org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner:1959 - Failed to execute op conv3dnew. Attempted to execute with 3 inputs, 1 outputs, 0 targs,0 bargs and 14 iargs. Inputs: [(FLOAT,[1,128,128,128,48],c), (FLOAT,[3,3,3,48,16],c), (FLOAT,[1,16],f)]. Outputs: [(FLOAT,[1,128,128,128,16],c)]. tArgs: -. iArgs: [3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]. bArgs: -. Op own name: “bebf562e-71d1-459c-bd27-7f6baeafa585” - Please see above message (printed out from c++) for a possible cause of error.

…

14:07:23 java.lang.RuntimeException: Op [conv3dnew] execution failed 14:07:23 at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1561) 14:07:23 at org.deeplearning4j.nn.layers.convolution.Convolution3DLayer.preOutput(Convolution3DLayer.java:275) 14:07:23 at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:509) 14:07:23 at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:110) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2442) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1744) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1700) 14:07:23 at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1630) 1

pom.xml
<dl4j.version>1.0.0-M2</dl4j.version>
org.deeplearning4j deeplearning4j-core ${dl4j.version} org.datavec datavec-data-image org.nd4j nd4j-native-platform ${dl4j.version} org.datavec datavec-api ${dl4j.version} org.nd4j nd4j-native ${dl4j.version} linux-x86_64-compat

treo · August 23, 2022, 6:56am

Can you also share a few of the lines above that error? Usually it does tell you what is wrong.

FanDev · August 26, 2022, 4:44pm

Hi Treo,

Ah I think my work node does not have enough memory to build and run, I ran it at another instance with large memory then it went through.

I found it requires ~10GB memory to allocate all necessary parameters, is it true or I may misconfigure something else?

Thanks.

treo · August 26, 2022, 4:49pm

That may happen if you have particularly large inputs, large batch sizes, or a big model.

Don’t forget that you need memory to not only hold your parameters, but also all intermediary state, like the results of operations that will receive parameter updates through back-propagation, as well as the updater state.

The updater state for the Adam optimizer will take about 3 times as much memory as your parameters alone. And depending on the model configuration and input size, that too will be dwarfed by the intermediary state that needs to be retained to calculate the gradient for your model.

FanDev · August 26, 2022, 6:40pm

Thank you for your quick reply. I will double-check the size.

Topic		Replies	Views
ERROR org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner - Failed to calculate output shapes for op onehot ND4J	4	295	June 5, 2023
Failed to execute op concat DL4J	4	1010	June 18, 2020
libnd4jcpu frame issue DL4J	4	496	March 22, 2023
Loading pre-trained model failed if add para "dilation_rate=2" in a Conv3D layer DL4J	10	445	December 15, 2021
Failed to calculate output shapes for op ND4J	5	875	May 18, 2021

Failed to execute op conv3dnew in M2

Related topics