Model works with CPU, but outputs all 0's with CUDA

Hi!

I’m having issues with getting my existing model to work with with CUDA.
I have a model with a custom dataset that works perfectly fine with CPU. My Metrics are

========================Evaluation Metrics========================
 # of classes:    2
 Accuracy:        0.7165
 Precision:       0.7142
 Recall:          0.7157
 F1 Score:        0.7149
Precision, recall & F1: reported for positive class (class 1 - "1") only


=========================Confusion Matrix=========================
    0    1
-----------
 3867 1524 | 0 = 0
 1513 3808 | 1 = 1

But when I update my pom to be

<dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>nd4j-cuda-10.1</artifactId>
            <version>1.0.0-beta7</version>
</dependency>

My eval becomes

========================Evaluation Metrics========================
 # of classes:    2
 Accuracy:        0.0000
 Precision:       0.0000
 Recall:          0.0000
 F1 Score:        0.0000
Precision, recall & F1: reported for positive class (class 1 - "1") only

Warning: 1 class was never predicted by the model and was excluded from average precision
Classes excluded from average precision: [0]
Warning: 1 class was never predicted by the model and was excluded from average recall
Classes excluded from average recall: [1]

=========================Confusion Matrix=========================
     0     1
-------------
     0 10712 | 0 = 0
     0     0 | 1 = 1

I can run the LeNetMNIST example with CUDA so I don’t think it’s a problem with my CUDA setup. I have also tried running my model with the existing MnistDataSetIterator and it also works fine. So I think it has something to do with my dataset. Is there something I am missing with custom datasets and CUDA?

I’m running this with a
2060Super
CUDA 10.1
Nvidia Drivers 460.89

Thanks!

@chinproisbestpro it could have something to do with precision. Could you try setting the datatype of the input arrays? You can do that with INDArray.castTo(DataType.DOUBLE) . That will give you more precision. Dl4j’s default data type is float to save space.
Some datasets it may make sense to use double for better convergence.
You can also set the type the net uses like this: deeplearning4j-examples/Ex3LambdaVertex.java at 9799225aa9edec0af7985b07510bfbd02bc80df6 · eclipse/deeplearning4j-examples · GitHub

Hey @agibsonccc thanks for the response!

I tried setting it the input and labels to DataType.DOUBLE and setting the net to use DOUBLE as well, but I’m still getting the same error. If it helps, the input arrays are filled with many (~600) 0’s and sparse (~20) 1’s

@chinproisbestpro could you make sure that the datasets are exactly the same when you’re training? The first step to verifying any issues is to make sure both training sets are the exact same. Short of that unless I can run your code line for line, there’s not a lot I can tell you. If you can post your whole training loop I can take a look.

@agibsonccc , the dataset is the exact same, however I am shuffling it before running it through the net, but I doubt that’s a problem.

Here is a code snippet of what I’m doing

Thanks again for helping out

@chinproisbestpro actually it is. If it’s not exactly reproducible every time I can’t debug it for you.
Depending on how your samples are distributed across batches and your batch size, it does affect the learning. Also, FWIW if your input columns really are only 621, you won’t see much if any speed increase from a gpu depending on your batch size.

Ensure you set a seed for your neural net so the results stay the same. That includes the weight initiailization.

Based on what I’m seeing here, you could easily put this in a separate pipeline and pre save the shuffled datasets. Once you do that, call:

dataset.save(..) 

on each one. Once the pipeline is reproducible, run this before you start your training:

 Nd4j.getExecutioner().setProfilingConfig(ProfilerConfig.builder()
               .checkForINF(true)
               .checkForNAN(true)
               .build());

See if cpu or gpu output anything different after that. Next would be digging in to the UI if your training doesn’t diverge.

@agibsonccc
So I took out the dataset.shuffle() and now it works on my GPU!
Is there something I have to tune with the shuffle method to make it work on GPU? No matter what, if I have a shuffle it puts everything as 0 metrics.

I even tested by using the same dataset, then shuffling with the same seed on GPU and CPU. CPU works fine but GPU outputs 0’s.

@chinproisbestpro there was an issue someone mentioned about shuffle not working properly on the GPU. You shouldn’t do ETL operations like that with the GPU anyways. It’s best to use the cpu for that and save your GPU for training. I’ll look in to that a bit later.

Thanks Adam, I’ll make sure to shuffle on CPU instead