PCA on GPU - Insufficient memory

Hi All,

I tried to perform PCA factorization on relatively small data set using GPU but ran into error. Code sample:

INDArray data = Nd4j.rand(1000,31000);
PCA.pca_factor(data, 50, false);

Output:

[main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
[main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 32
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [4]; Memory: [7.1GB];
[main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
[main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 10.2.89
[main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [GeForce GTX 1060 6GB]; cc: [6.1]; Total memory: [6442450944]
Exception in thread "main" java.lang.RuntimeException: Allocation failed: [[DEVICE] allocation failed; Error code: [2]]
	at org.nd4j.nativeblas.OpaqueDataBuffer.allocateDataBuffer(OpaqueDataBuffer.java:79)
	at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.initPointers(BaseCudaDataBuffer.java:389)
	...

And while it’s not completely clear from error message it looks like that memory is the reason.

2020-08-19-ND4J-PCA

CPU backed works just fine.

Could somebody please advise is there a way run PCA on GPU in this case? Does it make a sense to use that shared memory and how to do it if so? How to tackle such cases generally?

Hi, that doesn’t seem right. Do you have a test case we can see?

Hi!

It reproduces on any INDArray big enough like in code snippet from the post above (ready to run sample: https://github.com/sogawa-sps/testnd4j). In my real world app I have a sparse matrix (density less then 0.001) but issue reproduces on dense randomly filled matrix in the same way.