Here is some pre tests. I got tired so i cannot fix some errors right now.
Here is some weird results with performance. Some layers should be faster than another. Here is no explanation now, only raw data to check.
Take a look at 16 batch size 16 channels
. Normal convolution took 716 ms, but cnn2d 16 channels shape [256, 1, 28, 28] took almost 5 seconds while it has 16 * 16 times less calculations, because of 1 filter per channel. So it should run about 3 ms in theiry or at least 200 ms in real.
Something is going wrong here and/or I’m tired. Currently I am tired and barely understand things, forget stuff and can’t focus on tests. So I realized just to post it here so you can check raw data if you want.
Info: “Layer simulated compact cnn2d 16 channels shape” with 16 input channels is a 1 channel convolution cause it splits input size by 16.
Also check Single batch
here is some fast (about 15 times faster than cnn2d) results.
Due to this info I guess that cnn3d is a lot more optimized than cnn2d. It might be because of CUDA matrix management. Anyway 16 channels is very small amount and tests with higher channels size are required to get more correct info. Here is only first tests (single batch) are with 256 channels size, but they’re single batch.
I think here is a way to move batch/channels dimensions to another to optimize CUDA units management so all the CUDA cores get equals tasks. I succeed it with single batch, single channel tests went wrong because of error with shapes.
Output, a but cut cause of limit (long, for debug)
Starting tests
#### Single batch ####
Starting tests with minibatch size 1 nIn 256 nOut 256
03:53:11.149 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
03:53:11.156 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
03:53:11.157 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
03:53:11.161 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
03:53:11.161 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
03:53:12.850 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 32
03:53:12.893 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Windows 10]
03:53:12.893 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [8]; Memory: [8,0GB];
03:53:12.893 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
03:53:12.907 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 11.0.221
03:53:12.908 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [NVIDIA GeForce RTX 3060]; cc: [8.6]; Total memory: [12884901888]
03:53:12.908 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - Backend build information:
MSVC: 192829914
STD version: 201402L
CUDA: 11.0.221
DEFAULT_ENGINE: samediff::ENGINE_CUDA
HAVE_FLATBUFFERS
03:53:12.932 [main] INFO org.deeplearning4j.nn.graph.ComputationGraph - Starting ComputationGraph with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
03:53:14.124 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:53:14.726 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:53:14.726 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer normal convolution shape [1, 256, 28, 28] result millis 3792
Testing layer normal (?) 3D convolution shape [1, 256, 1, 28, 28] result millis 215
Testing layer normal separable convolution shape [1, 256, 28, 28] result millis 251
03:53:20.399 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:53:20.400 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:53:20.400 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer simulated compact cnn2d 16 channels shape [16, 16, 28, 28] result millis 752
Testing layer simulated compact cnn3d 16 channels shape [1, 16, 16, 28, 28] result millis 228
Testing layer normal separable simulated compact convolution shape [16, 16, 28, 28] result millis 406
Testing layer simulated channelwise (?) cnn3d 1 channel shape [1, 1, 256, 28, 28] result millis 1118
Testing layer simulated inverted channelwise (?) cnn3d 1 channel shape [1, 256, 1, 28, 28] result millis 167
03:53:23.729 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:53:23.730 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:53:23.730 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer simulated channelwise cnn2d 1 channel shape [256, 1, 28, 28] result millis 5557
Testing layer normal channelwise cnn2d 1 channel shape [1, 256, 28, 28] got exception Cannot do forward pass in DepthwiseConvolution2D layer (layer name = layer, layer index = 1): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 256, [minibatch,inputDepth,height,width]=[1, 256, 28, 28]; expected input channels = 1) (layer name: layer, layer index: 1, layer type: DepthwiseConvolution2DLayer)
Testing layer normal channelwise with single channel test cnn2d 1 channel shape [256, 1, 28, 28] result millis 214
#####################################
000 215 ms ___ Layer normal (?) 3D convolution shape [1, 256, 1, 28, 28]
000 214 ms ___ Layer normal channelwise with single channel test cnn2d 1 channel shape [256, 1, 28, 28]
003 792 ms ___ Layer normal convolution shape [1, 256, 28, 28]
000 251 ms ___ Layer normal separable convolution shape [1, 256, 28, 28]
000 406 ms ___ Layer normal separable simulated compact convolution shape [16, 16, 28, 28]
001 118 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [1, 1, 256, 28, 28]
005 557 ms ___ Layer simulated channelwise cnn2d 1 channel shape [256, 1, 28, 28]
000 752 ms ___ Layer simulated compact cnn2d 16 channels shape [16, 16, 28, 28]
000 228 ms ___ Layer simulated compact cnn3d 16 channels shape [1, 16, 16, 28, 28]
000 167 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [1, 256, 1, 28, 28]
#####################################
#### Single channel ####
Starting tests with minibatch size 256 nIn 1 nOut 256
03:53:30.700 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:53:30.701 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:53:30.702 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer normal convolution shape [256, 1, 28, 28] result millis 5538
Testing layer normal (?) 3D convolution shape [256, 1, 1, 28, 28] result millis 4396
Testing layer normal separable convolution shape [256, 1, 28, 28]shapeInfo Mismatched shape: [2, 186624,256, 1,186624, 8192,1,102]
Shape requested: : {186624, 1}
got exception Op [sconv2d] execution failed
Skipping layer test 'simulated compact cnn2d 16 channels' channels = 0
Skipping layer test 'simulated compact cnn3d 16 channels' channels = 0
Skipping layer test 'normal separable simulated compact convolution' channels = 0
Testing layer simulated channelwise (?) cnn3d 1 channel shape [256, 1, 1, 28, 28] result millis 4392
Testing layer simulated inverted channelwise (?) cnn3d 1 channel shape [256, 1, 1, 28, 28] result millis 4387
03:53:53.378 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:53:53.379 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:53:53.379 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer simulated channelwise cnn2d 1 channel shape [256, 1, 28, 28] got exception Cannot do forward pass in Convolution layer (layer name = layer, layer index = 1): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 1, [minibatch, channels, height, width]=[256, 1, 28, 28]; expected input channels = 256) (layer name: layer, layer index: 1, layer type: ConvolutionLayer)
Testing layer normal channelwise cnn2d 1 channel shape [256, 1, 28, 28] got exception Cannot do forward pass in DepthwiseConvolution2D layer (layer name = layer, layer index = 1): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 1, [minibatch,inputDepth,height,width]=[256, 1, 28, 28]; expected input channels = 256) (layer name: layer, layer index: 1, layer type: DepthwiseConvolution2DLayer)
Testing layer normal channelwise with single channel test cnn2d 1 channel shape [256, 1, 28, 28] got exception Cannot do forward pass in DepthwiseConvolution2D layer (layer name = layer, layer index = 1): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 1, [minibatch,inputDepth,height,width]=[256, 1, 28, 28]; expected input channels = 256) (layer name: layer, layer index: 1, layer type: DepthwiseConvolution2DLayer)
#####################################
004 396 ms ___ Layer normal (?) 3D convolution shape [256, 1, 1, 28, 28]
005 538 ms ___ Layer normal convolution shape [256, 1, 28, 28]
004 392 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [256, 1, 1, 28, 28]
004 387 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [256, 1, 1, 28, 28]
#####################################
#### 16 batch size 16 channels ####
<>
#####################################
000 485 ms ___ Layer normal (?) 3D convolution shape [16, 16, 1, 28, 28]
000 201 ms ___ Layer normal channelwise cnn2d 1 channel shape [16, 16, 28, 28]
000 716 ms ___ Layer normal convolution shape [16, 16, 28, 28]
000 368 ms ___ Layer normal separable convolution shape [16, 16, 28, 28]
004 295 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [16, 1, 16, 28, 28]
005 544 ms ___ Layer simulated compact cnn2d 16 channels shape [256, 1, 28, 28]
004 266 ms ___ Layer simulated compact cnn3d 16 channels shape [16, 1, 16, 28, 28]
000 458 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [16, 16, 1, 28, 28]
#####################################
#### Additional 1 batch size 1 channels ####
Starting tests with minibatch size 1 nIn 1 nOut 256
03:54:13.039 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:54:13.041 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:54:13.041 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer normal convolution shape [1, 1, 28, 28] result millis 206
Testing layer normal (?) 3D convolution shape [1, 1, 1, 28, 28] result millis 94
Testing layer normal separable convolution shape [1, 1, 28, 28]shapeInfo Mismatched shape: [2, 729,256, 1,729, 8192,1,102]
Shape requested: : {729, 1}
got exception Op [sconv2d] execution failed
Skipping layer test 'simulated compact cnn2d 16 channels' channels = 0
Skipping layer test 'simulated compact cnn3d 16 channels' channels = 0
Skipping layer test 'normal separable simulated compact convolution' channels = 0
Testing layer simulated channelwise (?) cnn3d 1 channel shape [1, 1, 1, 28, 28] result millis 98
Testing layer simulated inverted channelwise (?) cnn3d 1 channel shape [1, 1, 1, 28, 28] result millis 94
03:54:13.659 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:54:13.660 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:54:13.660 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer simulated channelwise cnn2d 1 channel shape [1, 1, 28, 28] result millis 209
Testing layer normal channelwise cnn2d 1 channel shape [1, 1, 28, 28] result millis 95
Testing layer normal channelwise with single channel test cnn2d 1 channel shape [1, 1, 28, 28] result millis 91
#####################################
000 094 ms ___ Layer normal (?) 3D convolution shape [1, 1, 1, 28, 28]
000 095 ms ___ Layer normal channelwise cnn2d 1 channel shape [1, 1, 28, 28]
000 091 ms ___ Layer normal channelwise with single channel test cnn2d 1 channel shape [1, 1, 28, 28]
000 206 ms ___ Layer normal convolution shape [1, 1, 28, 28]
000 098 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [1, 1, 1, 28, 28]
000 209 ms ___ Layer simulated channelwise cnn2d 1 channel shape [1, 1, 28, 28]
000 094 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [1, 1, 1, 28, 28]
#####################################
#### Additional 256 batch size 16 channels ####
Starting tests with minibatch size 256 nIn 16 nOut 256
03:54:14.143 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:54:14.144 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:54:14.144 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer normal convolution shape [256, 16, 28, 28] result millis 6092
Testing layer normal (?) 3D convolution shape [256, 16, 1, 28, 28] result millis 5172
Testing layer normal separable convolution shape [256, 16, 28, 28] result millis 3025
03:54:31.323 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:54:31.324 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:54:31.324 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer simulated compact cnn2d 16 channels shape [4096, 1, 28, 28]03:54:32.473 [main] WARN org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper - Error getting CuDNN forward algorithm - falling back on IMPLICIT_GEMM
03:54:32.499 [main] WARN org.deeplearning4j.nn.layers.convolution.ConvolutionLayer - CuDNN execution failed - falling back on built-in implementation
java.lang.RuntimeException: CuDNN error = 8: CUDNN_STATUS_EXECUTION_FAILED during forward pass - step cudnnConvolutionForward: inputShape=[4096, 1, 28, 28], weightsShape=[256, 1, 2, 2], biasShape=[1, 256], kernel=[2, 2], stride=[1, 1], padding=[0, 0], dilation=[1, 1], AlgoMode=USER_SPECIFIED, fwdAlgo=IMPLICIT_GEMM, convolutionMode=Truncate
<>
at PerformanceTest.main(PerformanceTest.java:45)
got exception cudaMalloc failed; Bytes: [6115296256]; Error code [2]; DEVICE [0]
Testing layer simulated compact cnn3d 16 channels shape [256, 1, 16, 28, 28] got exception Cannot invoke "org.nd4j.linalg.api.memory.pointers.PagedPointer.withOffset(long, long)" because the return value of "org.nd4j.linalg.api.memory.pointers.PointersPair.getDevicePointer()" is null
<> exception Cannot invoke "org.nd4j.linalg.api.memory.pointers.PagedPointer.withOffset(long, long)" because the return value of "org.nd4j.linalg.api.memory.pointers.PointersPair.getDevicePointer()" is null
03:54:33.906 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Attempting to initialize cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper
03:54:33.907 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - Cudnn helper org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
03:54:33.907 [main] DEBUG org.deeplearning4j.nn.layers.HelperUtils - org.deeplearning4j.cuda.convolution.CudnnConvolutionHelper successfully initialized
Testing layer simulated channelwise cnn2d 1 channel shape [4096, 1, 28, 28] got exception Cannot do forward pass in Convolution layer (layer name = layer, layer index = 1): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 1, [minibatch, channels, height, width]=[4096, 1, 28, 28]; expected input channels = 256) (layer name: layer, layer index: 1, layer type: ConvolutionLayer)
Testing layer normal channelwise cnn2d 1 channel shape [256, 16, 28, 28] got exception Cannot do forward pass in DepthwiseConvolution2D layer (layer name = layer, layer index = 1): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 16, [minibatch,inputDepth,height,width]=[256, 16, 28, 28]; expected input channels = 256) (layer name: layer, layer index: 1, layer type: DepthwiseConvolution2DLayer)
Testing layer normal channelwise with single channel test cnn2d 1 channel shape [4096, 1, 28, 28] got exception Cannot do forward pass in DepthwiseConvolution2D layer (layer name = layer, layer index = 1): input array channels does not match CNN layer configuration (data format = NCHW, data input channels = 1, [minibatch,inputDepth,height,width]=[4096, 1, 28, 28]; expected input channels = 256) (layer name: layer, layer index: 1, layer type: DepthwiseConvolution2DLayer)
#####################################
005 172 ms ___ Layer normal (?) 3D convolution shape [256, 16, 1, 28, 28]
006 092 ms ___ Layer normal convolution shape [256, 16, 28, 28]
003 025 ms ___ Layer normal separable convolution shape [256, 16, 28, 28]
#####################################
Done!
Results
#### Single batch ####
000 215 ms ___ Layer normal (?) 3D convolution shape [1, 256, 1, 28, 28]
000 214 ms ___ Layer normal channelwise with single channel test cnn2d 1 channel shape [256, 1, 28, 28]
003 792 ms ___ Layer normal convolution shape [1, 256, 28, 28]
000 251 ms ___ Layer normal separable convolution shape [1, 256, 28, 28]
000 406 ms ___ Layer normal separable simulated compact convolution shape [16, 16, 28, 28]
001 118 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [1, 1, 256, 28, 28]
005 557 ms ___ Layer simulated channelwise cnn2d 1 channel shape [256, 1, 28, 28]
000 752 ms ___ Layer simulated compact cnn2d 16 channels shape [16, 16, 28, 28]
000 228 ms ___ Layer simulated compact cnn3d 16 channels shape [1, 16, 16, 28, 28]
000 167 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [1, 256, 1, 28, 28]
#### Single channel ####
004 396 ms ___ Layer normal (?) 3D convolution shape [256, 1, 1, 28, 28]
005 538 ms ___ Layer normal convolution shape [256, 1, 28, 28]
004 392 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [256, 1, 1, 28, 28]
004 387 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [256, 1, 1, 28, 28]
#### 16 batch size 16 channels ####
000 485 ms ___ Layer normal (?) 3D convolution shape [16, 16, 1, 28, 28]
000 201 ms ___ Layer normal channelwise cnn2d 1 channel shape [16, 16, 28, 28]
000 716 ms ___ Layer normal convolution shape [16, 16, 28, 28]
000 368 ms ___ Layer normal separable convolution shape [16, 16, 28, 28]
004 295 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [16, 1, 16, 28, 28]
005 544 ms ___ Layer simulated compact cnn2d 16 channels shape [256, 1, 28, 28]
004 266 ms ___ Layer simulated compact cnn3d 16 channels shape [16, 1, 16, 28, 28]
000 458 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [16, 16, 1, 28, 28]
#### Additional 1 batch size 1 channels ####
000 094 ms ___ Layer normal (?) 3D convolution shape [1, 1, 1, 28, 28]
000 095 ms ___ Layer normal channelwise cnn2d 1 channel shape [1, 1, 28, 28]
000 091 ms ___ Layer normal channelwise with single channel test cnn2d 1 channel shape [1, 1, 28, 28]
000 206 ms ___ Layer normal convolution shape [1, 1, 28, 28]
000 098 ms ___ Layer simulated channelwise (?) cnn3d 1 channel shape [1, 1, 1, 28, 28]
000 209 ms ___ Layer simulated channelwise cnn2d 1 channel shape [1, 1, 28, 28]
000 094 ms ___ Layer simulated inverted channelwise (?) cnn3d 1 channel shape [1, 1, 1, 28, 28]
#### Additional 256 batch size 16 channels ####
005 172 ms ___ Layer normal (?) 3D convolution shape [256, 16, 1, 28, 28]
006 092 ms ___ Layer normal convolution shape [256, 16, 28, 28]
003 025 ms ___ Layer normal separable convolution shape [256, 16, 28, 28]
Done!
Source code for tests
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.Map.Entry;
import java.util.TreeMap;
import org.deeplearning4j.nn.conf.ComputationGraphConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.inputs.InputType;
import org.deeplearning4j.nn.conf.layers.Convolution3D;
import org.deeplearning4j.nn.conf.layers.Convolution3D.DataFormat;
import org.deeplearning4j.nn.conf.layers.ConvolutionLayer;
import org.deeplearning4j.nn.conf.layers.DepthwiseConvolution2D;
import org.deeplearning4j.nn.conf.layers.GlobalPoolingLayer;
import org.deeplearning4j.nn.conf.layers.Layer;
import org.deeplearning4j.nn.conf.layers.OutputLayer;
import org.deeplearning4j.nn.conf.layers.SeparableConvolution2D;
import org.deeplearning4j.nn.graph.ComputationGraph;
import org.deeplearning4j.nn.modelimport.keras.preprocessors.ReshapePreprocessor;
import org.deeplearning4j.nn.weights.WeightInit;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import org.nd4j.linalg.learning.config.Nesterovs;
public class PerformanceTest
{
private static final TreeMap<String, Long> cache = new TreeMap<>();
public static void main(final String[ ] args)
{
System.out.println("Starting tests");
System.out.println("#### Single batch ####");
performTests(1, 256, 256);
System.out.println("#### Single channel ####");
performTests(256, 1, 256);
System.out.println("#### 16 batch size 16 channels ####");
performTests(16, 16, 256);
System.out.println("#### Additional 1 batch size 1 channels ####");
performTests(1, 1, 256);
System.out.println("#### Additional 256 batch size 16 channels ####");
performTests(256, 16, 256);
System.out.println("Done!");
}
private static INDArray getFeatures(final long minibatchSize, final long channels)
{
return Nd4j.ones(minibatchSize, channels, 28L, 28L);
}
private static ComputationGraph getGraph(final long nIn, final Layer layer)
{
final ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder().seed(-1).l2(0.0005) // ridge regression value
.updater(new Nesterovs(1e-4)).weightInit(WeightInit.XAVIER).graphBuilder().addInputs("input")
.setInputTypes(InputType.convolutional(28, 28, nIn)).setOutputs("output").layer("layer", layer, "input")
.layer("ss", new GlobalPoolingLayer.Builder().build(), "layer")
.layer("output", new OutputLayer.Builder().nOut(2L).build(), "ss").build();
final ComputationGraph net = new ComputationGraph(conf);
net.init();
return net;
}
private static ComputationGraph getGraphCnn3D(final long channels, final long depth, final long nOut,
final Layer layer)
{
final ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder().seed(-1).l2(0.0005) // ridge regression value
.updater(new Nesterovs(1e-4)).weightInit(WeightInit.XAVIER).graphBuilder().addInputs("input")
.setInputTypes(InputType.convolutional3D(DataFormat.NCDHW, depth, 28, 28, channels))
.setOutputs("output").layer("layer", layer, "input")
.layer("ss", new GlobalPoolingLayer.Builder().build(), "layer")
.inputPreProcessor("ss", new ReshapePreprocessor(new long[ ]
{ -1, nOut, depth, 27, 27 }, new long[ ]
{ -1, nOut * depth, 27, 27 }, true))
// .inputPreProcessor("ss",
// new ComposableInputPreProcessor(new Cnn3DToFeedForwardPreProcessor((int) depth, 28, 28),
// new FeedForwardToCnnPreProcessor(28, 28, depth)))
.layer("output", new OutputLayer.Builder().nOut(2L).build(), "ss").build();
final ComputationGraph net = new ComputationGraph(conf);
net.init();
return net;
}
private static void performTests(final long minibatchSize, final long nIn, final long nOut)
{
System.out.println("Starting tests with minibatch size " + minibatchSize + " nIn " + nIn + " nOut " + nOut);
cache.clear();
//normal
test(0, minibatchSize, nIn, nOut);
test(1, minibatchSize, nIn, nOut);
test(2, minibatchSize, nIn, nOut);
//compact
test(3, minibatchSize, nIn, nOut);
test(4, minibatchSize, nIn, nOut);
test(9, minibatchSize, nIn, nOut);
//depthwise
test(5, minibatchSize, nIn, nOut);
test(6, minibatchSize, nIn, nOut);
test(7, minibatchSize, nIn, nOut);
test(8, minibatchSize, nIn, nOut);
test(10, minibatchSize, nIn, nOut);
System.out.println("#####################################");
final NumberFormat integerInstance = NumberFormat.getIntegerInstance();
integerInstance.setMinimumIntegerDigits(6);
for (final Entry<String, Long> entry : cache.entrySet())
{
System.out.println(integerInstance.format(entry.getValue())
+ /*new String(new byte[ 64 - entry.getKey().length() ]) +*/" ms ___ " + entry.getKey());
}
System.out.println("#####################################");
}
private static void test(final ComputationGraph graph, final String layerName, final INDArray features)
{
System.out.print("Testing layer " + layerName + " shape " + Arrays.toString(features.shape()));
try
{
for (int i = 0; i < 20; i++)
{
graph.output(features);
}
final long t0 = System.currentTimeMillis();
for (int i = 0; i < 100; i++)
{
graph.output(features);
}
final long td = System.currentTimeMillis() - t0;
cache.put("Layer " + layerName + " shape " + Arrays.toString(features.shape()), td);
System.out.println(" result millis " + td);
}
catch (final Exception e)
{
System.out.println(" got exception " + e.getMessage());
}
}
private static void test(final int index, final long minibatchSize, final long nIn, final long nOut)
{
switch (index)
{
case 0:
test(getGraph(nIn, new ConvolutionLayer.Builder().kernelSize(2, 2).nOut(nOut).build()),
"normal convolution", getFeatures(minibatchSize, nIn));
break;
case 1:
test(getGraphCnn3D(nIn, 1, nOut, new Convolution3D.Builder().kernelSize(1, 2, 2).nOut(nOut).build()),
"normal (?) 3D convolution", Nd4j.ones(minibatchSize, nIn, 1, 28, 28));
break;
case 2:
test(getGraph(nIn, new SeparableConvolution2D.Builder().kernelSize(2, 2).nOut(nOut).build()),
"normal separable convolution", getFeatures(minibatchSize, nIn));
break;
case 3:
if (nIn / 16 == 0)
{
System.out.println("Skipping layer test 'simulated compact cnn2d 16 channels' channels = 0");
break;
}
test(getGraph(nIn / 16, new ConvolutionLayer.Builder().kernelSize(2, 2).nOut(nOut).build()),
"simulated compact cnn2d 16 channels", getFeatures(minibatchSize * 16, nIn / 16));
break;
case 4:
if (nIn / 16 == 0)
{
System.out.println("Skipping layer test 'simulated compact cnn3d 16 channels' channels = 0");
break;
}
test(getGraphCnn3D(nIn / 16, 16, nOut,
new Convolution3D.Builder().kernelSize(1, 2, 2).nOut(nOut).build()),
"simulated compact cnn3d 16 channels", Nd4j.ones(minibatchSize, nIn / 16, 16, 28, 28));
break;
case 5:
test(getGraphCnn3D(1, nIn, nOut, new Convolution3D.Builder().kernelSize(1, 2, 2).nOut(nOut).build()),
"simulated channelwise (?) cnn3d 1 channel", Nd4j.ones(minibatchSize, 1, nIn, 28, 28));
break;
case 6:
test(getGraphCnn3D(nIn, 1, nOut, new Convolution3D.Builder().kernelSize(1, 2, 2).nOut(nOut).build()),
"simulated inverted channelwise (?) cnn3d 1 channel", Nd4j.ones(minibatchSize, nIn, 1, 28, 28));
break;
case 7:
test(getGraph(minibatchSize, new ConvolutionLayer.Builder().kernelSize(2, 2).nOut(nOut).build()),
"simulated channelwise cnn2d 1 channel", getFeatures(minibatchSize * nIn, 1L));
break;
case 8:
test(getGraph(minibatchSize, new DepthwiseConvolution2D.Builder().kernelSize(2, 2).nOut(nOut).build()),
"normal channelwise cnn2d 1 channel", getFeatures(minibatchSize, nIn));
break;
case 9:
if (nIn / 16 == 0)
{
System.out.println(
"Skipping layer test 'normal separable simulated compact convolution' channels = 0");
break;
}
test(getGraph(nIn / 16, new SeparableConvolution2D.Builder().kernelSize(2, 2).nOut(nOut).build()),
"normal separable simulated compact convolution", getFeatures(minibatchSize * 16, nIn / 16));
break;
case 10:
test(getGraph(minibatchSize, new DepthwiseConvolution2D.Builder().kernelSize(2, 2).nOut(nOut).build()),
"normal channelwise with single channel test cnn2d 1 channel",
getFeatures(minibatchSize * nIn, 1L));
break;
default:
throw new IllegalArgumentException("No such index " + index);
}
}
}