RecurrentAttentionLayer error on gpu

liweigu · March 5, 2020, 10:09am

When using RecurrentAttentionLayer on gpu, it gets exception:

Exception in thread “main” java.lang.RuntimeException: Op [multi_head_dot_product_attention_bp] execution failed
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2316)
at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6599)
at org.nd4j.autodiff.samediff.internal.InferenceSession.doExec(InferenceSession.java:483)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getOutputs(InferenceSession.java:217)
at org.nd4j.autodiff.samediff.internal.InferenceSession.getOutputs(InferenceSession.java:67)
at org.nd4j.autodiff.samediff.internal.AbstractSession.output(AbstractSession.java:380)
at org.nd4j.autodiff.samediff.SameDiff.directExecHelper(SameDiff.java:2601)
at org.nd4j.autodiff.samediff.SameDiff.batchOutputHelper(SameDiff.java:2569)
at org.nd4j.autodiff.samediff.SameDiff.calculateGradientsAndOutputs(SameDiff.java:4049)
at org.nd4j.autodiff.samediff.SameDiff.calculateGradients(SameDiff.java:4010)
at org.deeplearning4j.nn.layers.samediff.SameDiffLayer.backpropGradient(SameDiffLayer.java:169)
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doBackward(LayerVertex.java:149)
at org.deeplearning4j.nn.graph.ComputationGraph.calcBackpropGradients(ComputationGraph.java:2713)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1382)
at org.deeplearning4j.nn.graph.ComputationGraph.computeGradientAndScore(ComputationGraph.java:1342)
at org.deeplearning4j.optimize.solvers.BaseOptimizer.gradientAndScore(BaseOptimizer.java:170)
at org.deeplearning4j.optimize.solvers.StochasticGradientDescent.optimize(StochasticGradientDescent.java:63)
at org.deeplearning4j.optimize.Solver.optimize(Solver.java:52)
at org.deeplearning4j.nn.graph.ComputationGraph.fitHelper(ComputationGraph.java:1166)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1116)
at org.deeplearning4j.nn.graph.ComputationGraph.fit(ComputationGraph.java:1026)
…
Caused by: java.lang.RuntimeException: [DEVICE] allocation failed; Error code: [2]
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2500)
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2306)
… 22 more

The layer is added by the code like:

graphBuilder = graphBuilder.addLayer(“attension1”,
new RecurrentAttentionLayer.Builder().activation(Activation.SOFTSIGN)
.updater(new Adam(learningRateSchedule)).nHeads(5).headSize(40).nIn(sentenceNOut).nOut(sentenceNOut).build()
, lastLayerName);
lastLayerName = “attension1”;

It runs ok without the RecurrentAttentionLayer (s).

Dl4j version is beta-6.
The enviroment is same as No CUDA devices were found.

treo · March 5, 2020, 10:29am

This looks like you again don’t have enough memory. Attention uses quite a lot of memory, and I’ve struggled even with the 11GB of VRAM that I have.

raver119 · March 5, 2020, 10:39am

Ye, attention ops need some polishing in order to reduce memory footprint they use… Known issue.

liweigu · March 5, 2020, 12:39pm

It’s indeed short of memory.
When i use only one attension layer instead of two previously, it can run.

Topic		Replies	Views
No CUDA devices were found DL4J	13	1186	March 4, 2020
Bert: Allocation failed: [[DEVICE] allocation failed; Error code: [2]] DL4J	2	364	January 26, 2022
Allocation failed: [[DEVICE] allocation failed; Error code: [2]] ND4J	4	526	May 24, 2022
GPU Error between epochs ND4J	3	511	October 15, 2020
GPU memory usage for BERT in SameDiff is extremely high and grows with size of triaining set SameDiff	4	1271	June 18, 2020

RecurrentAttentionLayer error on gpu

Related topics