Nd4j.scatterUpdates slower than simple CPU implementation

ebeaufay · August 31, 2023, 12:15pm

I’m using the EmbeddingLayer and SequenceEmbeddingLayer but I noticed it was slow.

I tracked it down to the call to Nd4j.scatterUpdate

It felt unreasonably slow so I tried to do it in java.

from this:

        INDArray weightGradients = this.gradientViews.get("W");
        weightGradients.assign(0);
        INDArray indices = Nd4j.createFromArray(this.indexes);
        Nd4j.scatterUpdate(ScatterUpdate.UpdateOp.ASSIGN, weightGradients, indices, epsilon, WEIGHT_DIM);

to this:

        INDArray weightGradients = this.gradientViews.get("W");
        weightGradients.assign(0);

        float[][] weightGradientUpdates = new float[(int) this.layerConf().getDictionarySize()][(int) nOut];
        float[][] eps = new float[(int)nOut][(int)epsilon.size(0)];
        for (int j = 0; j < nOut; j++) {
            INDArray column = epsilon.getColumn(j);
            eps[j] = column.data().asFloat();
        }

        for (int i = 0; i < indexes.length; i++) {
            for (int j = 0; j < nOut; j++) {
                weightGradientUpdates[indexes[i]][j]+=eps[j][i];
            }
        }

        INDArray reshape = Nd4j.create(weightGradientUpdates).reshape(this.layerConf().getDictionarySize(), 1);
        weightGradients.addi(reshape);

makes the code 100x faster

The size of the weightsGradient does matter in the CPU work-around and if it gets too large (>1M), the RAM grows too much and the call to Nd4j.create is slow. So, this CPU workaround doesn’t scale.

these are my backend dependencies and I have cuDnn set up:

<dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.6</artifactId>
            <version>1.0.0-M2.1</version>
        </dependency>
        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-11.6</artifactId>
            <version>1.0.0-M2.1</version>
            <classifier>windows-x86_64-cudnn</classifier>
        </dependency>

agibsonccc · September 1, 2023, 7:38am

@ebeaufay can you setup a reproducer for me and file a github issue? Sign in to GitHub · GitHub

ebeaufay · September 1, 2023, 5:51pm

yes, I added an issue: Nd4j.ScatterUpdates has a large overhead · Issue #10029 · deeplearning4j/deeplearning4j · GitHub

agibsonccc · September 2, 2023, 2:14am

@ebeaufay thanks I’ll take a look!

Topic		Replies	Views
Nd4j.scatterUpdate replacement? ND4J	1	235	August 22, 2023
INDArray in ND4J 1.0.0-M2.1 slower on Android than 1.0.0-beta6 ND4J	1	23	November 13, 2024
How can I accelerate the progress of Nd4j.createFromArray()? ND4J	1	399	October 28, 2020
Unexpected Slow Performance ND4J	28	776	October 15, 2021
Recommended way to create INDArray for prediction? DL4J	5	987	May 29, 2020

Nd4j.scatterUpdates slower than simple CPU implementation

Related topics