Unexpected Slow Performance

officiallor · October 13, 2021, 12:22pm

With the following and 16 Threads it takes 3’ to crash with OutOfMemoryError. With 8 Threads it crashes faster. The memory usage spikes upto 12GB and then goes lower. The CPU is around 70%+.
I couldn’t use muli on the second line cause I was getting another exception:
java.lang.IllegalStateException: Cannot perform in-place operation "muli": result array shape does not match the broadcast operation output shape: [35040].muli([35040, 1]) != [35040].

            in.getOf((n-1),0).assign(saved.getAllColumnsOfRow((n-1)).mul(timestep / 60f)
                .mul(retail.muli((1 + e_infr).pow(n.toInt() - 1)))
                .addi(sold.getAllColumnsOfRow(n-1).muli(timestep / 60f)
                .muli(exc_enrg_pr * (1 + e_infr).pow(n.toInt() - 1))).sumFloat())

agibsonccc · October 13, 2021, 12:34pm

@officiallor Hmm…that looks a bit off. If you’re only using vectors you can easily reshape there to ensure ti’s broadcastable. It might be worth doing that for your use case there. If you don’t know the length of the output you can just do .reshape(length) or something similar.

Could you give me a bit more than your loop here or something I could run on my end to dissect this a bit for you? If I can run a self contained calculation I can try to give you something a bit mor efficient.

officiallor · October 13, 2021, 1:03pm

I am giving the idea here in the following code. I don’t know what else I can give to help

        val n_max = 20
        val e_infr = 2
        val timestep = 15
        val exc_enrg_pr = 0
        val in = zeroes(n_max.toLong(), 1)
        val out = zeroes(n_max.toLong(), 1)
        val sold = zeroes(n_max.toLong(), 35040)
        val saved = zeroes(n_max.toLong(), 35040)
        var retail =   zeroes(1, 35040)
       val numberHere = please enter a random number of your choice


                for (i in 1..35040 step 96) { //this for is written to fill retail for the example, 
                                             //the original has other operations inside - I just want to fill retail
                    retail.get(NDArrayIndex.all(), NDArrayIndex.interval(i - 1, i + 95)).assign(numberHere)
                }

             retail = retail.transpose().mul(1 + numberHere / 100f)

           for (n in 1 .. n_max.toLong()) {
             in.getOf((n - 1), 0).assign(
                saved.getAllColumnsOfRow((n - 1)).mul(timestep / 60f)
                    .mul(retail.mul((1 + e_infr).pow(n.toFloat() - 1)))
                    .add( 
                        sold.getAllColumnsOfRow(n - 1).mul(timestep / 60f)
                            .mul(exc_enrg_pr * (1 + e_infr).pow(n.toFloat() - 1))
                    ).sumNumber().toFloat()
            )
           }

agibsonccc · October 13, 2021, 1:13pm

@officiallor it’ll take me some time to replicate this. When I asked for code, I meant some I could copy/paste and run really fast not psuedo code.
By not giving it to me in exact form, I now have to spend time understanding your interpretation of certain things as well as re implement psuedo code like getOf(…) when all I should be doing is focusing on your performance issue.

I’ll take a look but I want to just give that feedback for future reference. Costing me extra time doesn’t help me help you

agibsonccc · October 13, 2021, 1:22pm

@officiallor here’s beginning sample I’m looking at and will debug:

    public static void main(String...args) {
        int nMax = 20;
        int eInfr = 2;
        int timeStep = 15;
        int excEnrgPr = 0;
        INDArray in = Nd4j.zeros(nMax,1);
        INDArray out = Nd4j.zeros(nMax,1);
        INDArray sold = Nd4j.zeros(nMax,35040);
        INDArray saved = Nd4j.zeros(nMax,35040);
        INDArray retail = Nd4j.zeros(1,35040);
        int numberHere = 2;
        for(int i = 1; i < 35040; i+= 96) {
            retail.get(NDArrayIndex.all(), NDArrayIndex.interval(i - 1, i + 95)).assign(numberHere);
        }

        retail = retail.transpose().mul(1 + numberHere / 100f);
        for(int i = 1; i < nMax; i++) {
            in.slice((i - 1), 0).assign(
                    saved.slice((i - 1)).mul(timeStep / 60f)
                            .muli(retail.mul(Math.pow(1 + eInfr,i - 1)))
                            .addi(sold.slice(i - 1).muli(timeStep / 60f)
                                            .muli(Math.pow(excEnrgPr * (1 + eInfr),i - 1))
                            ).sumNumber().doubleValue());

        }

    }

I’ll post this so you can follow along.

officiallor · October 13, 2021, 1:23pm

Ohh, sorry but I couldn’t help more cause it’s not so straight forward.
The getOf() accepts vararg Long. It creates NDArrayIndexes and calls get() with those indexes as parameter. It can be replaced with get(NDArrayIndex.point(y),NDArrayIndex.point(z)) here

agibsonccc · October 13, 2021, 1:26pm

@officiallor do you mind posting your helpers just in case?

Edit: Good news is I’m reproducing your memory issues. So this is a start to me looking at it in depth. Thanks for meeting me in the middle a bit.

officiallor · October 13, 2021, 1:28pm

Can I send them privately? I mean it’s a big file. Or I can send the specific ones that I’m using in this example in a PM.

officiallor · October 15, 2021, 7:25am

Hello,
I noticed that in for loops you are using < but it is <=. I didn’t make it clear I think in the last posts. The … in kotlin is inclusive.

Topic		Replies	Views
AMD Ryzen 5000 CPU - Poor Performance DL4J	16	939	August 14, 2021
ND4J slow on M3 ND4J	1	55	August 19, 2024
How can I accelerate the progress of Nd4j.createFromArray()? ND4J	1	399	October 28, 2020
Nd4j.scatterUpdates slower than simple CPU implementation ND4J	3	193	September 2, 2023
Is ND4J still active? ND4J	4	429	October 30, 2023

Unexpected Slow Performance

Related topics