Hello everyone, I am working on a project where I am using yolov2 model wrapped in ParallelIference.
Haedware properties:
OS : windows 10
RAM: 16GB
GPU: 2xGTX3060 ti
CUDA: 11.6
My pom.xml file:
-----------------------------------------pom.xml-------------------------------------------------
4.0.0
org.springframework.boot
spring-boot-starter-parent
2.7.2
com.matcon
yolo-v2-runner
0.0.1-SNAPSHOT
yolo-v2-runner
yolo-v2-runner
<java.version>11</java.version>
<maven.compiler.source>11</maven.compiler.source>
<maven.compiler.target>11</maven.compiler.target>
<dl4j-master.version>1.0.0-M2.1</dl4j-master.version>
<nd4j-master.version>1.0.0-M2.1</nd4j-master.version>
11.6
<maven-shade-plugin.version>3.3.0</maven-shade-plugin.version>
<maven.minimum.version>3.3.1</maven.minimum.version>
<exec-maven-plugin.version>1.4.0</exec-maven-plugin.version>
<shaded.classifier>bin</shaded.classifier>
org.springframework.boot
spring-boot-starter-web
org.springframework.boot
spring-boot-starter-validation
org.springframework.boot
spring-boot-starter-test
test
org.deeplearning4j
deeplearning4j-core
${dl4j-master.version}
org.nd4j
nd4j-native-platform
${nd4j-master.version}
org.nd4j
nd4j-cuda-${cuda-version}-platform
${nd4j-master.version}
org.deeplearning4j
deeplearning4j-parallel-wrapper
${dl4j-master.version}
org.deeplearning4j
deeplearning4j-zoo
${dl4j-master.version}
org.bytedeco
opencv-platform
4.5.5-1.5.7
<!-- $CUDA_VERSION-$CUDNN_VERSIUON-$JAVACPP_VERSION-->
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>${exec-maven-plugin.version}</version>
<executions>
<execution>
<goals>
<goal>exec</goal>
</goals>
</execution>
</executions>
<configuration>
<executable>java</executable>
</configuration>
</plugin>
<plugin>
<groupId>com.lewisd</groupId>
<artifactId>lint-maven-plugin</artifactId>
<version>0.0.11</version>
<configuration>
<failOnViolation>true</failOnViolation>
<onlyRunRules>
<rule>DuplicateDep</rule>
<rule>RedundantPluginVersion</rule>
<!-- Rules incompatible with Java 9
<rule>VersionProp</rule>
<rule>DotVersionProperty</rule> -->
</onlyRunRules>
</configuration>
<executions>
<execution>
<id>pom-lint</id>
<phase>validate</phase>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-enforcer-plugin</artifactId>
<version>1.0.1</version>
<executions>
<execution>
<id>enforce-default</id>
<goals>
<goal>enforce</goal>
</goals>
<configuration>
<rules>
<requireMavenVersion>
<version>[${maven.minimum.version},)</version>
<message>********** Minimum Maven Version is ${maven.minimum.version}. Please
upgrade Maven before continuing (run "mvn --version" to check). **********
</message>
</requireMavenVersion>
</rules>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.0.0-M5</version>
<inherited>true</inherited>
<dependencies>
<dependency>
<groupId>org.apache.maven.surefire</groupId>
<artifactId>surefire-junit-platform</artifactId>
<version>3.0.0-M5</version>
</dependency>
</dependencies>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>com.lewisd</groupId>
<artifactId>lint-maven-plugin</artifactId>
<versionRange>[0.0.11,)</versionRange>
<goals>
<goal>check</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
-----------------------------------------pom.xml-------------------------------------------------
Bellow is the output of application start:
-----------------------------------------app logs-------------------------------------------------
2022-09-29 13:27:09.846 INFO 1796 — [ main] org.nd4j.linalg.factory.Nd4jBackend : Loaded [JCublasBackend] backend
2022-09-29 13:27:13.762 INFO 1796 — [ main] org.nd4j.nativeblas.NativeOpsHolder : Number of threads used for linear algebra: 32
2022-09-29 13:27:13.826 INFO 1796 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Backend used: [CUDA]; OS: [Windows 10]
2022-09-29 13:27:13.826 INFO 1796 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Cores: [4]; Memory: [2.0GB];
2022-09-29 13:27:13.826 INFO 1796 — [ main] o.n.l.a.o.e.DefaultOpExecutioner : Blas vendor: [CUBLAS]
2022-09-29 13:27:13.837 INFO 1796 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : ND4J CUDA build version: 11.6.55
2022-09-29 13:27:13.840 INFO 1796 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : CUDA device 0: [NVIDIA GeForce RTX 3060 Ti]; cc: [8.6]; Total memory: [8589279232]
2022-09-29 13:27:13.840 INFO 1796 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : CUDA device 1: [NVIDIA GeForce RTX 3060 Ti]; cc: [8.6]; Total memory: [8589410304]
2022-09-29 13:27:13.840 INFO 1796 — [ main] org.nd4j.linalg.jcublas.JCublasBackend : Backend build information:
MSVC: 192930146
STD version: 201402L
DEFAULT_ENGINE: samediff::ENGINE_CUDA
HAVE_FLATBUFFERS
-----------------------------------------app logs-------------------------------------------------
My project works fine with one gpu, but when I got the second one I ecountered the following error:
-----------------------------------------exception------------------------------------------------
java.lang.RuntimeException: MmulHelper::mmulMxM cuda failed !; Error code: [700]
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:2067) ~[nd4j-cuda-11.6-1.0.0-M2.1.jar:na]
at org.nd4j.linalg.jcublas.ops.executioner.CudaExecutioner.exec(CudaExecutioner.java:1870) ~[nd4j-cuda-11.6-1.0.0-M2.1.jar:na]
at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6545) ~[nd4j-api-1.0.0-M2.1.jar:na]
at org.nd4j.linalg.api.blas.impl.BaseLevel3.gemm(BaseLevel3.java:62) ~[nd4j-api-1.0.0-M2.1.jar:na]
at org.nd4j.linalg.api.ndarray.BaseNDArray.mmuli(BaseNDArray.java:3202) ~[nd4j-api-1.0.0-M2.1.jar:na]
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.preOutput(ConvolutionLayer.java:473) ~[deeplearning4j-nn-1.0.0-M2.1.jar:na]
at org.deeplearning4j.nn.layers.convolution.ConvolutionLayer.activate(ConvolutionLayer.java:509) ~[deeplearning4j-nn-1.0.0-M2.1.jar:na]
at org.deeplearning4j.nn.graph.vertex.impl.LayerVertex.doForward(LayerVertex.java:110) ~[deeplearning4j-nn-1.0.0-M2.1.jar:na]
at org.deeplearning4j.nn.graph.ComputationGraph.outputOfLayersDetached(ComputationGraph.java:2450) ~[deeplearning4j-nn-1.0.0-M2.1.jar:na]
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1752) ~[deeplearning4j-nn-1.0.0-M2.1.jar:na]
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1708) ~[deeplearning4j-nn-1.0.0-M2.1.jar:na]
at org.deeplearning4j.nn.graph.ComputationGraph.output(ComputationGraph.java:1694) ~[deeplearning4j-nn-1.0.0-M2.1.jar:na]
at org.deeplearning4j.parallelism.InplaceParallelInference$ModelHolder.output(InplaceParallelInference.java:267) ~[deeplearning4j-parallel-wrapper-1.0.0-M2.1.jar:na]
at org.deeplearning4j.parallelism.InplaceParallelInference$ModelSelector.output(InplaceParallelInference.java:150) ~[deeplearning4j-parallel-wrapper-1.0.0-M2.1.jar:na]
at org.deeplearning4j.parallelism.InplaceParallelInference.output(InplaceParallelInference.java:91) ~[deeplearning4j-parallel-wrapper-1.0.0-M2.1.jar:na]
at org.deeplearning4j.parallelism.ParallelInference.output(ParallelInference.java:191) ~[deeplearning4j-parallel-wrapper-1.0.0-M2.1.jar:na]
at org.deeplearning4j.parallelism.ParallelInference.output(ParallelInference.java:187) ~[deeplearning4j-parallel-wrapper-1.0.0-M2.1.jar:na]
-----------------------------------------exception------------------------------------------------