RNN and FF getting mixed up in MergeVertex, new FileStatsStorage not work

my mergevertex is complaining that it is getting layers with different types as input, it should all be rnn as I am using 1dcnn with an input vector of 483 values.

	ComputationGraphConfiguration.GraphBuilder graph = new NeuralNetConfiguration.Builder().seed(seed).activation(Activation.SWISH).optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).updater(new Adam(0.0003,0.9,0.999,0.1)).weightInit(WeightInit.XAVIER).miniBatch(true)
			.cacheMode(CacheMode.NONE).trainingWorkspaceMode(WorkspaceMode.ENABLED).inferenceWorkspaceMode(WorkspaceMode.ENABLED).convolutionMode(ConvolutionMode.Causal).graphBuilder();
	graph.setInputTypes(InputType.recurrent(483));
	//stem
	graph
		.addLayer("stem-cnn1",new Convolution1DLayer.Builder(3,2).nIn(483).nOut(32).build(),"input")

		.addLayer("stem-batch1", new BatchNormalization.Builder(false).decay(0.995).eps(0.001).nIn(32).nOut(32).build(),"stem-cnn1")

		.addLayer("stem-cnn2",new Convolution1DLayer.Builder(3).nIn(32).nOut(32).build(),"stem-batch1")

		.addLayer("stem-batch2",new BatchNormalization.Builder(false).decay(0.995).eps(0.001).nIn(32).nOut(32).build(),"stem-cnn2")

		.addLayer("stem-cnn3",new Convolution1DLayer.Builder(3).nIn(32).nOut(64).build(),"stem-batch2")

        .addLayer("stem-batch3", new BatchNormalization.Builder(false).decay(0.995).eps(0.001).nIn(64).nOut(64).build(), "stem-cnn3")
        //left branch
        .addLayer("stem-pool1",new Subsampling1DLayer.Builder(Subsampling1DLayer.PoolingType.MAX, 3, 2).build(),"stem-batch3")
        //right branch
        .addLayer("stem-cnn4",new Convolution1DLayer.Builder(3,2).nIn(64).nOut(96).build(),"stem-batch3")

        .addLayer("stem-batch4", new BatchNormalization.Builder(false).decay(0.995).eps(0.001).nIn(96).nOut(96).build(), "stem-cnn4")
        //merge
        .addVertex("concat1", new MergeVertex(),"stem-pool1", "stem-batch4")

the error>

Invalid input: MergeVertex cannot merge activations of different types: first type = RNN, input type 2 = FF
at org.deeplearning4j.nn.conf.graph.MergeVertex.getOutputType(MergeVertex.java:139)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.getLayerActivationTypes(ComputationGraphConfiguration.java:537)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration.addPreProcessors(ComputationGraphConfiguration.java:450)
at org.deeplearning4j.nn.conf.ComputationGraphConfiguration$GraphBuilder.build(ComputationGraphConfiguration.java:1202)

For some reason the BatchNormalization is using feed forward which is messing things up, i assume BatchNormalization is supported for 1dcnn? Is some preprocessor not getting triggered maybe?

on an unreleated issue

new FileStatsStorage(new File(“/home/workspace/netStats/test1.dat”)); to attch to UI server is not working, it complains there is no file exist, if i create empty text file with same name it complains not valid mapDB database, so how do it create the file in the first place for saving the stats to file? new InMemoryStatsStorage(); works fine

Thanks in advance

I have found out that the merge vertex works fine if the inputs are Convolution1DLayers and pooling layers, I can thus only do BatchNormalization successfully AFTER the merge vertex like so>

        //left branch
        .addLayer("stem-pool1",new Subsampling1DLayer.Builder(Subsampling1DLayer.PoolingType.MAX, 3, 2).build(),"stem-batch3")
        //right branch
        .addLayer("stem-cnn4",new Convolution1DLayer.Builder(3,2).nIn(64).nOut(96).build(),"stem-batch3")
        //merge
        .addVertex("concat1", new MergeVertex(),"stem-pool1", "stem-cnn4")
        .addLayer("stem-batch4", new BatchNormalization.Builder(false).decay(0.995).eps(0.001).build(), "concat1")

This is problematic because if I have merge vertex with two or three Convolution1DLayers and a pooling layer I can’t do BatchNormalization after each Convolution1DLayer as recommended, only once after the merge vertex which may have unintended consequences for training which I have not got to yet such as exploding gradients.

The strange thing is that if my merge vertex only contains BatchNormalization layers and no pooling layers it will work fine! It is the specific combination of pooling and BatchNormalization layers in a merge vertex which generates the error.

I am guessing this is a bug?

@MPdaedalus mind filing an issue with a stripped down example that reproduces the issue? Thanks!

Thought I’d test it with latest snapshot before filing a bug report in case it has been fixed already but javacpp not playing ball with snapshot, which version of cuda-platform-redist do I need to use for snapshot? seems only cuda 11.2 uses javacpp 1.5.5

https://mvnrepository.com/artifact/org.bytedeco/cuda-platform-redist

but deeplearning4j only supports up to 10.2?

Warning: Versions of org.bytedeco:javacpp:1.5.5 and org.bytedeco:cuda:10.2-7.6-1.5.3 do not match.
23:31:02.142 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
23:31:02.153 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.

I dunno why it is referencing the old version I was using with beta7 as my POM references the new version and pulled in 3GB or so of new files and I ran maven clean and force update.

here is the important parts of my pom.xml

<dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-ui</artifactId>
        <version>${dl4j.version}</version>
	</dependency>
	<dependency>
		<groupId>org.eclipse.collections</groupId>
		<artifactId>eclipse-collections-api</artifactId>
		<version>10.4.0</version>
	</dependency>
	<dependency>
		<groupId>org.eclipse.collections</groupId>
		<artifactId>eclipse-collections</artifactId>
		<version>10.4.0</version>
	</dependency>
	<dependency>
		<groupId>org.nd4j</groupId>
		<artifactId>nd4j-cuda-10.2</artifactId>
		<version>${dl4j.version}</version>
	</dependency>
	<dependency>
		<groupId>org.deeplearning4j</groupId>
		<artifactId>deeplearning4j-core</artifactId>
		<version>${dl4j.version}</version>
	</dependency>
	<dependency>
		<groupId>org.deeplearning4j</groupId>
		<artifactId>deeplearning4j-cuda-10.2</artifactId>
		<version>${dl4j.version}</version>
	</dependency>
	<dependency>
		<groupId>org.bytedeco</groupId>
		<artifactId>cuda-platform-redist</artifactId>
		<version>11.2-8.1-1.5.5</version>
	</dependency>
	<dependency>
		<groupId>org.deeplearning4j</groupId>
		<artifactId>deeplearning4j-zoo</artifactId>
		<version>${dl4j.version}</version>
	</dependency>
	<dependency>
		<groupId>org.deeplearning4j</groupId>
		<artifactId>deeplearning4j-datavec-iterators</artifactId>
		<version>${dl4j.version}</version>
	</dependency>
	<dependency>
		<groupId>org.datavec</groupId>
		<artifactId>datavec-local</artifactId>
		<version>${dl4j.version}</version>
	</dependency>
</dependencies>

using <dl4j.version>1.0.0-SNAPSHOT</dl4j.version>
<nd4j.version>1.0.0-SNAPSHOT</nd4j.version>

For CUDA 10.2, these instructions here are up-to-date:
https://deeplearning4j.konduit.ai/config/backends/config-cudnn
You can ignore the warning about the versions, that’s not a problem.

Ok the bug is fixed in snapshot, I can mix batchNorm and Pooling no problem, the model compiles fine with snapshot but not with beta7, hopefully the bug is also fixed in beta8, due anytime now?

There is either still a problem with my cuda backend (works fine in beta7) or something else is going wrong as i’m getting java.lang.UnsupportedOperationException when Nd4j.create is called with my training data or before using>
model.setListeners(new StatsListener(statsStorage),new ScoreIterationListener(10));

, which does not occur if I switch back to beta7 or use CPU, my pom.xml is same as prev post.

Warning: Versions of org.bytedeco:javacpp:1.5.5 and org.bytedeco:cuda:10.2-7.6-1.5.3 do not match.
23:43:43.788 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
23:43:43.798 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
23:43:43.799 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
23:43:43.855 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
23:43:43.856 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
23:43:45.711 [main] INFO org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 32
23:43:45.740 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Linux]
23:43:45.740 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [4]; Memory: [8.0GB];
23:43:45.741 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
23:43:45.750 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 10.2.89
23:43:45.752 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [GeForce GTX 1060 6GB]; cc: [6.1]; Total memory: [6373179392]
Exception in thread “main” java.lang.UnsupportedOperationException
at org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner.createShapeInfo(DefaultOpExecutioner.java:945)
at org.nd4j.linalg.api.shape.Shape.createShapeInformation(Shape.java:3279)
at org.nd4j.linalg.api.ndarray.BaseShapeInfoProvider.createShapeInformation(BaseShapeInfoProvider.java:75)
at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:92)
at org.nd4j.jita.constant.ProtectedCudaShapeInfoProvider.createShapeInformation(ProtectedCudaShapeInfoProvider.java:73)
at org.nd4j.linalg.jcublas.CachedShapeInfoProvider.createShapeInformation(CachedShapeInfoProvider.java:42)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:166)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:234)
at org.nd4j.linalg.api.ndarray.BaseNDArray.(BaseNDArray.java:225)
at org.nd4j.linalg.jcublas.JCublasNDArray.(JCublasNDArray.java:72)
at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:151)
at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3445)

Just for clarification, if i’m using cuda-platform-redist I don’t need cuda installed on my linux system just the x11 cuda drivers?

@MPdaedalus hm, it should be hitting this method, not the super method:

Yes again coming soon, but no ETA yet. Ideally end of month at the latest. I’m still auditing the tests (we’ve had a bit of technical debt accrue that I’m currently cleaning up)

ok the error has now changed with the latest snapshot but still no luck

23:20:09.587 [main] INFO org.nd4j.linalg.factory.Nd4jBackend - Loaded [JCublasBackend] backend
23:20:09.596 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
23:20:09.597 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
23:20:09.650 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.nd4j.linalg.jblas.JblasBackend] of provided class-loader.
23:20:09.651 [main] ERROR org.nd4j.common.config.ND4JClassLoading - Cannot find class [org.canova.api.io.data.DoubleWritable] of provided class-loader.
Exception in thread “main” java.lang.ExceptionInInitializerError
at org.nd4j.jita.concurrency.CudaAffinityManager.getNumberOfDevices(CudaAffinityManager.java:132)
at org.nd4j.jita.constant.ConstantProtector.purgeProtector(ConstantProtector.java:56)
at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:49)
at org.nd4j.jita.constant.ConstantProtector.(ConstantProtector.java:37)
at org.nd4j.jita.constant.ProtectedCudaConstantHandler.(ProtectedCudaConstantHandler.java:65)
at org.nd4j.jita.constant.CudaConstantHandler.(CudaConstantHandler.java:34)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:468)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:62)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:56)
at org.nd4j.linalg.factory.Nd4j.initWithBackend(Nd4j.java:5152)
at org.nd4j.linalg.factory.Nd4j.initContext(Nd4j.java:5093)
at org.nd4j.linalg.factory.Nd4j.(Nd4j.java:270)
at org.nd4j.linalg.dataset.DataSet.(DataSet.java:111)
at org.nd4j.linalg.dataset.DataSet.(DataSet.java:94)
at org.nd4j.linalg.dataset.DataSet.(DataSet.java:67)
at main.PatternDetect.getTrainingData(PatternDetect.java:145)
at main.PatternDetect.run(PatternDetect.java:60)
at main.Start.main(Start.java:15)
Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: https://deeplearning4j.konduit.ai/nd4j/backend
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:116)
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:37)
… 19 more
Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcuda in java.library.path: /usr/local/cuda/lib64:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2447)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:809)
at java.base/java.lang.System.loadLibrary(System.java:1893)
at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1631)
at org.bytedeco.javacpp.Loader.load(Loader.java:1265)
at org.bytedeco.javacpp.Loader.load(Loader.java:1109)
at org.nd4j.nativeblas.Nd4jCuda.(Nd4jCuda.java:10)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:468)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:62)
at org.nd4j.common.config.ND4JClassLoading.loadClassByName(ND4JClassLoading.java:56)
at org.nd4j.nativeblas.NativeOpsHolder.(NativeOpsHolder.java:88)
… 20 more
Caused by: java.lang.UnsatisfiedLinkError: no nd4jcuda in java.library.path: /usr/local/cuda/lib64:/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2447)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:809)
at java.base/java.lang.System.loadLibrary(System.java:1893)
at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1631)
at org.bytedeco.javacpp.Loader.load(Loader.java:1213)
… 27 more

I don’t understand why it is looking for the native libs on my machine when I have cuda-platform-redist in my pom.xml

		<groupId>org.bytedeco</groupId>
		<artifactId>cuda-platform-redist</artifactId>
		<version>10.2-7.6-1.5.3</version>
	</dependency>
<dependency>
		<groupId>org.deeplearning4j</groupId>
		<artifactId>deeplearning4j-cuda-10.2</artifactId>
		<version>${dl4j.version}</version>
	</dependency>
<dependency>
		<groupId>org.nd4j</groupId>
		<artifactId>nd4j-cuda-10.2</artifactId>
		<version>${dl4j.version}</version>
	</dependency>

I should add that I have installed cuda 10.2 and cuDNN 7.6.5 and made sure libcudnn.so.7.6.5 is in /usr/local/cuda/lib64 but it makes no difference.

have the snapshots been tested to work correctly in eclipse not just jetbrains IDE? because i’m running out of ideas, or is the snapshot just so unstable that it can’t be used?

further investigation has found that if I add “platform” instead of just “nd4j-cuda-10.2” as mentioned in the docs the error in the previous post disappears and I am back to the previous error with UnsupportedOperationException in createShapeInfo

Exception in thread “main” java.lang.UnsupportedOperationException
at org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner.createShapeInfo(DefaultOpExecutioner.java:945)

this error also comes up for everything I run such as the examples LinearDataClassifier and IrisClassifier. It also comes up regardless of whether I am using cuda-platform-redist or relying on the native binaries in my /usr/local/cuda/lib64 folder which seems to indicate to me it is a problem with the snapshot and not with my cuda backend, (like I said before the cuda backend works fine in beta7),
so I’m going to file a bug report for the error.

		<groupId>org.nd4j</groupId>
		<artifactId>nd4j-cuda-10.2-platform</artifactId>
		<version>${dl4j.version}</version>
	</dependency>

however there seem to be builds missing for certain OSes that cause maven and eclipse to complain with the platform release. I don’t use them but its needed for proper build.

Missing artifact org.nd4j:nd4j-cuda-10.2:jar:linux-ppc64le:1.0.0-SNAPSHOT
Missing artifact org.nd4j:nd4j-cuda-10.2:jar:windows-x86_64:1.0.0-SNAPSHOT

problem is fixed in cuda 11.2 snapshot