DL4J and SQLDB : could not create cache

using sqldb to import a Keras Model then run inference and for some reason it seems to fail to set the cache for javacpp even though the home folder cache exists and even after adding a system property pointing the cache to a custom dir in which case the folder is created but nothing gets written to it .

I posted the same issue on the sqldb side , just curious to see if if anybody has an an idea on what may be the root cause .

[2020-05-09 14:05:24,395] ERROR {“type”:1,“deserializationError”:null,“recordProcessingError”:{“errorMessage”:“Error computing expression ENGINEANOMALY(TEMP1, TEMP2, TEMP3, TEMP4, TEMP5) for column MSE with index 6: Failed to invoke function public double com.dellemc.ksql.functions.EngineAnomaly.engineAnomaly(double,double,double,double,double)”,“record”:null,“cause”:[“Failed to invoke function public double com.dellemc.ksql.functions.EngineAnomaly.engineAnomaly(double,double,double,double,double)”,“java.lang.reflect.InvocationTargetException”,“java.io.IOException: Could not create the cache: Set the “org.bytedeco.javacpp.cachedir” system property.”,“Could not create the cache: Set the “org.bytedeco.javacpp.cachedir” system property.”]},“productionError”:null} (processing.6012138171148277256.Project:44) (e

If the java process is running as a user that does not have a home directory (or no permissions to access certain directories.) You will have to set the cachedir to a directory that it can access.

First thing I’d do is to figure out what user is running the Java code and what directories it has access to.

Thanks Eduardo for taking a look , appreciated .

Both processes involved are started under user (here araji )

(base) araji@infra:~$ ps -ef |grep ksql
araji 24531 237277 3 23:00 pts/2 00:01:04 /usr/lib/jvm/java-11-openjdk-amd64//bin/java -cp /home/araji/demo/dev/araji/confluent-5.5.0/share/java/confluent-security/ksql/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/monitoring-interceptors/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/ksqldb/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/rest-utils/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/confluent-common/: -Xmx3g -server -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+ExplicitGCInvokesConcurrent -XX:NewRatio=1 -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dksql.log.dir=/home/araji/demo/dev/araji/confluent-5.5.0/logs -Dlog4j.configuration=file:/home/araji/demo/dev/araji/confluent-5.5.0/etc/ksqldb/log4j.properties -Dorg.bytedeco.javacpp.cachedir=/tmp/cachedir -Dksql.server.install.dir=/home/araji/demo/dev/araji/confluent-5.5.0 -Xlog:gc:file=/home/araji/demo/dev/araji/confluent-5.5.0/logs/ksql-server-gc.log:time,tags:filecount=10,filesize=102400 io.confluent.ksql.rest.server.KsqlServerMain tan.properties
araji 28795 260684 0 23:03 pts/12 00:00:08 /usr/lib/jvm/java-11-openjdk-amd64//bin/java -cp /home/araji/demo/dev/araji/confluent-5.5.0/share/java/confluent-security/ksql/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/monitoring-interceptors/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/ksqldb/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/rest-utils/:/home/araji/demo/dev/araji/confluent-5.5.0/share/java/confluent-common/*: -Xmx3g -server -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+ExplicitGCInvokesConcurrent -XX:NewRatio=1 -Djava.awt.headless=true -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dksql.log.dir=/home/araji/demo/dev/araji/confluent-5.5.0/logs -Dorg.bytedeco.javacpp.cachedir=/tmp/cachedir -Dlog4j.configuration=file:/home/araji/demo/dev/araji/confluent-5.5.0/etc/ksqldb/log4j-file.properties io.confluent.ksql.Ksql – http://localhost:8088

you will see that I even added the cachedir system property to both of them , and after checking the code generating the error in question (https://github.com/bytedeco/javacpp/blob/6aee52e4573dd31f1bff27a5b029adc072a27ca2/src/main/java/org/bytedeco/javacpp/Loader.java#L910) I checked manually :

(base) araji@infra:~$ jshell
| Welcome to JShell – Version 11.0.6
| For an introduction type: /help intro

jshell> File fdirlist = {new File("/tmp/cachedir"), new File("/home/araji/.javacpp/cache/"), new File("/tmp/.javacpp-araji/cache/")};
fdirlist ==> File[3] { /tmp/cachedir, /home/araji/.javacpp/cache, /tmp/.javacpp-araji/cache }

jshell> for (File f:fdirlist) {
…> if ((f.exists() ) && f.canRead() && f.canWrite() && f.canExecute()) { System.out.println(f.getAbsolutePath() +" aok") ;} else { System.out.println(f.getAbsolutePath() +" NotOK") ;}
…>
…> }
/tmp/cachedir aok
/home/araji/.javacpp/cache aok
/tmp/.javacpp-araji/cache aok

Thanks again for taking a look.

Are you sure that you have permission to execute code from those directories? Try creating an executable in those directories and execute it.

Thanks Samuel for taking a look .

I did verify that and was able to exec , ran an strace after that to get more info and got the attached output.
the app is a uberjar that was built with the following pom and so I assume that all the required pieces are in the jar .Can the failure to locate libhdf5*.so actually be triggering that error .

POM:

<repositories>
		<repository>
			<id>confluent</id>
			<url>http://packages.confluent.io/maven/</url>
		</repository>
	</repositories>


	<properties>
		<java.version>1.11</java.version>
		<kafka.version>2.4.0</kafka.version>

		<confluent.version>5.4.2</confluent.version>
		<dl4j.version>1.0.0-beta6</dl4j.version>

		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	</properties>
	<dependencies>
		<!-- for json rest request  -->
		<dependency>
			<groupId>org.json</groupId>
			<artifactId>json</artifactId>
			<version>20190722</version>
		</dependency>

		<!-- KSQL Dependency is needed to write your own UDF -->
		<dependency>
			<groupId>io.confluent.ksql</groupId>
			<artifactId>ksql-udf</artifactId>
			<version>${confluent.version}</version>
			<exclusions>
				<exclusion>
					<groupId>log4j</groupId>
					<artifactId>log4j</artifactId>
				</exclusion>
			</exclusions>
		</dependency>

		<!-- H2O.ai dependency for the Deep Learning model -->
		<!-- https://mvnrepository.com/artifact/ai.h2o/h2o-genmodel -->

		<dependency>
			<groupId>org.deeplearning4j</groupId>
			<artifactId>deeplearning4j-modelimport</artifactId>
			<version>${dl4j.version}</version>
		</dependency>
		<!-- dependency needed for dl4j , we are using cpu only here -->
		<dependency>
			<groupId>org.nd4j</groupId>
			<artifactId>nd4j-native-platform</artifactId>
			<version>${dl4j.version}</version>
		</dependency>
		<dependency>
			<groupId>org.bytedeco.javacpp-presets</groupId>
			<artifactId>hdf5-platform</artifactId>
			<version>1.10.4-1.4.4</version>
		</dependency>

		<dependency>
			<groupId>ai.h2o</groupId>
			<artifactId>h2o-genmodel</artifactId>
			<version>3.30.0.2</version>
		</dependency>
	</dependencies>
	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<version>3.6.1</version>
				<configuration>
					<source>11</source>
					<target>11</target>
				</configuration>
			</plugin>

			<!--package as one fat jar -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-assembly-plugin</artifactId>
				<version>2.5.2</version>
				<configuration>
					<descriptorRefs>
						<descriptorRef>jar-with-dependencies</descriptorRef>
					</descriptorRefs>
					<archive>
						<manifest>
							<addClasspath>true</addClasspath>
							<mainClass>${exec.mainClass}</mainClass>
						</manifest>
					</archive>
				</configuration>
				<executions>
					<execution>
						<id>assemble-all</id>
						<phase>package</phase>
						<goals>
							<goal>single</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>
</project>

I think I understand what is going on. There’s probably a SecurityManager preventing you from writing to the file system. You’ll need to give your application the right permissions. I’ve updated JavaCPP in commit https://github.com/bytedeco/javacpp/commit/d7c1b0e228933def3a9dcc823a153540d62531eb to log any such exceptions, which you could try with 1.5.4-SNAPSHOT to confirm.

Thanks Samuel , will give that a shot .

Bingo …:

[pid 257928] 09:06:47 access("/tmp/cachedir", W_OK) = 0
Warning: Could not access /tmp/cachedir: A UDF attempted to execute the following cmd: /tmp/cachedir

I would have stared at this for days …, much appreciated the quick response and support.

after disabling the securityManager , I am now getting
“java.lang.reflect.InvocationTargetException”,“java.lang.ExceptionInInitializerError”,"org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html

but pom used already include backend

        <groupId>org.deeplearning4j</groupId>

        <artifactId>deeplearning4j-modelimport</artifactId>

        <version>${dl4j.version}</version>

    </dependency>

    <!-- dependency needed for dl4j , we are using cpu only here -->

    <dependency>

        <groupId>org.nd4j</groupId>

        <artifactId>nd4j-native-platform</artifactId>

        <version>${dl4j.version}</version>

    </dependency>

    <dependency>

        <groupId>org.bytedeco.javacpp-presets</groupId>

        <artifactId>hdf5-platform</artifactId>

        <version>1.10.4-1.4.4</version>

    </dependency>

also posted on gitter.im .

does the log say anything about trying to load libnd4j? Do you find libnd4jcpu.so in your javacpp cache directory?

nothing about libnd4j that I can see in my logs ,and seem to have the library in the cache

sudo find / -name libnd4jcpu.so -print
/home/araji/.javacpp/cache/nd4j-native-1.0.0-beta6-linux-x86_64.jar/org/nd4j/nativeblas/linux-x86_64/libnd4jcpu.so

redid the test with this time the 1.5.4-SNAPSHOT (versus 1.5.2)
and the error message is different (new warning before the same error ) :
[pid 113084] 13:12:19 lstat("/tmp/cachedir/anomaly-detection-udf-2.0-jar-with-dependencies.jar/org/bytedeco/hdf5/linux-x86_64/libjnihdf5.so", {st_mode=S_IFREG|0664, st_size=2293624, …}) = 0
[pid 113084] 13:12:19 stat("./g++", 0x7f773fffb950) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/usr/lib/jvm/java-11-openjdk-amd64/lib/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/usr/java/packages/lib/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/usr/lib/x86_64-linux-gnu/jni/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/lib/x86_64-linux-gnu/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/usr/lib/x86_64-linux-gnu/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/usr/lib/jni/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/lib/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
[pid 113084] 13:12:19 stat("/usr/lib/libjnijavacpp.so", 0x7f773fffa2a0) = -1 ENOENT (No such file or directory)
Warning: Could not load FloatPointer: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path: [/usr/java/packages/lib, /usr/lib/x86_64-linux-gnu/jni, /lib/x86_64-linux-gnu, /usr/lib/x86_64-linux-gnu, /usr/lib/jni, /lib, /usr/lib]
[pid 113210] 13:12:19 openat(AT_FDCWD, “/sys/fs/cgroup/memory/user/araji/0/memory.limit_in_bytes”, O_RDONLY) = 438
[pid 113084] 13:12:19 stat("/home/araji/demo/dev/araji/confluent-5.5.0/META-INF/services/org.nd4j.linalg.factory.Nd4jBackend", 0x7f773fffb4a0) = -1 ENOENT (No such file or directory)
[2020-05-11 13:12:19,230] ERROR {“type”:1,“deserializationError”:null,“recordProcessingError”:{“errorMessage”:“Error computing expression ENGINEANOMALY(TEMP1, TEMP2, TEMP3, TEMP4, TEMP5) for column MSE with index 6: Failed to invoke function public double com.dellemc.ksql.functions.EngineAnomaly.engineAnomaly(double,double,double,double,double)”,“record”:null,“cause”:[“Failed to invoke function public double com.dellemc.ksql.functions.EngineAnomaly.engineAnomaly(double,double,double,double,double)”,“java.lang.reflect.InvocationTargetException”,“java.lang.ExceptionInInitializerError”,“org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html","Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html"]},"productionError”:null} (processing.4093695909854621455.Project:44)

any pointer that could point me in the right direction appeciated.

Huh, that is weird, it should be looking for libjnijavacpp.so in the cache directory too.
I guess we’ll have to wait for @saudet to take a closer look at what is happening here.

That’s not required, you can ignore those warnings. But please update the version for HDF5.

Thanks Samuel .

updated pom is now

        <groupId>org.deeplearning4j</groupId>

        <artifactId>deeplearning4j-modelimport</artifactId>

        <version>${dl4j.version}</version>

    </dependency>

    <!-- dependency needed for dl4j , we are using cpu only here -->
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-native-platform</artifactId>
        <version>${dl4j.version}</version>
    </dependency>
    <dependency>
        <groupId>org.bytedeco</groupId>
        <artifactId>hdf5-platform</artifactId>
        <version>1.12.0-1.5.3</version>
    </dependency>
    <dependency>
        <groupId>org.bytedeco</groupId>
        <artifactId>javacpp</artifactId>
        <version>1.5.4-SNAPSHOT</version>
    </dependency>

logs shows

pid 10338] 17:45:54 lstat("/home/araji/.javacpp/cache/anomaly-detection-udf-2.0-jar-with-dependencies.jar/org/bytedeco/javacpp/linux-x86_64/libjnijavacpp.so", {st_mode=S_IFREG|0664, st_size=55864, …}) = 0
[pid 10338] 17:45:54 stat("/home/araji/.javacpp/cache/anomaly-detection-udf-2.0-jar-with-dependencies.jar/org/bytedeco/javacpp/linux-x86_64/libjnijavacpp.so", {st_mode=S_IFREG|0664, st_size=55864, …}) = 0
[pid 10338] 17:45:54 lstat("/home/araji/.javacpp/cache/anomaly-detection-udf-2.0-jar-with-dependencies.jar/org/bytedeco/javacpp/linux-x86_64/libjnijavacpp.so", {st_mode=S_IFREG|0664, st_size=55864, …}) = 0
[pid 10351] 17:45:54 openat(AT_FDCWD, “/sys/fs/cgroup/memory/user/araji/0/memory.limit_in_bytes”, O_RDONLY) = 439
[pid 10471] 17:45:54 openat(AT_FDCWD, “/sys/fs/cgroup/memory/user/araji/0/memory.limit_in_bytes”, O_RDONLY) = 439
[pid 10473] 17:45:54 openat(AT_FDCWD, “/sys/fs/cgroup/memory/user/araji/0/memory.limit_in_bytes”, O_RDONLY) = 439
[pid 10470] 17:45:54 openat(AT_FDCWD, “/sys/fs/cgroup/memory/user/araji/0/memory.limit_in_bytes”, O_RDONLY) = 439
[pid 8588] 17:45:54 openat(AT_FDCWD, “/sys/fs/cgroup/memory/user/araji/0/memory.limit_in_bytes”, O_RDONLY) = 439
[pid 10338] 17:45:54 stat("/home/araji/demo/dev/araji/confluent-latest/confluent-5.5.0/META-INF/services/org.nd4j.linalg.factory.Nd4jBackend", 0x7f62454ba4a0) = -1 ENOENT (No such file or directory)
[pid 10471] 17:45:54 openat(AT_FDCWD, “/sys/fs/cgroup/memory/user/araji/0/memory.limit_in_bytes”, O_RDONLY) = 439
[2020-05-11 17:45:54,863] ERROR {“type”:1,“deserializationError”:null,“recordProcessingError”:{“errorMessage”:“Error computing expression ENGINEANOMALY(TEMP1, TEMP2, TEMP3, TEMP4, TEMP5) for column MSE with index 6: Failed to invoke function public double com.dellemc.ksql.functions.EngineAnomaly.engineAnomaly(double,double,double,double,double)”,“record”:null,“cause”:[“Failed to invoke function public double com.dellemc.ksql.functions.EngineAnomaly.engineAnomaly(double,double,double,double,double)”,“java.lang.reflect.InvocationTargetException”,“java.lang.ExceptionInInitializerError”,“org.nd4j.linalg.factory.Nd4jBackend$NoAvailableBackendException: Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html","Please ensure that you have an nd4j backend on your classpath. Please see: http://nd4j.org/getstarted.html"]},"productionError”:null} (processing.3049118052935007767.Project:44)

Are you sure your build doesn’t remove any resources file of ND4J?

not that I know of .how can I check .
full pom file here

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns=“http://maven.apache.org/POM/4.0.0

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>

<groupId>com.dellemc.apps</groupId>

<artifactId>anomaly-detection-udf</artifactId>

<version>2.0</version>

<repositories>

    <repository>

        <id>confluent</id>

        <url>http://packages.confluent.io/maven/</url>

    </repository>

    <repository>

        <id>sonatype-nexus-snapshots</id>

        <url>https://oss.sonatype.org/content/repositories/snapshots</url>

    </repository>

</repositories>

<properties>

    <java.version>1.11</java.version>

    <kafka.version>2.4.0</kafka.version>

    <confluent.version>5.4.2</confluent.version>

    <dl4j.version>1.0.0-beta6</dl4j.version>

    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

</properties>

<dependencies>

    <!-- for json rest request  -->

    <dependency>

        <groupId>org.json</groupId>

        <artifactId>json</artifactId>

        <version>20190722</version>

    </dependency>

    <!-- KSQL Dependency is needed to write your own UDF -->

    <dependency>

        <groupId>io.confluent.ksql</groupId>

        <artifactId>ksql-udf</artifactId>

        <version>${confluent.version}</version>

        <exclusions>

            <exclusion>

                <groupId>log4j</groupId>

                <artifactId>log4j</artifactId>

            </exclusion>

        </exclusions>

    </dependency>

    <!-- H2O.ai dependency for the Deep Learning model -->

    <!-- https://mvnrepository.com/artifact/ai.h2o/h2o-genmodel -->

    <dependency>

        <groupId>org.deeplearning4j</groupId>

        <artifactId>deeplearning4j-modelimport</artifactId>

        <version>${dl4j.version}</version>

    </dependency>

    <!-- dependency needed for dl4j , we are using cpu only here -->

    <dependency>

        <groupId>org.nd4j</groupId>

        <artifactId>nd4j-native-platform</artifactId>

        <version>${dl4j.version}</version>

    </dependency>

    <dependency>

        <groupId>org.bytedeco</groupId>

        <artifactId>hdf5-platform</artifactId>

        <version>1.12.0-1.5.3</version>

    </dependency>

    <dependency>

        <groupId>org.bytedeco</groupId>

        <artifactId>javacpp</artifactId>

        <version>1.5.4-SNAPSHOT</version>

    </dependency>
    <dependency>
        <groupId>ai.h2o</groupId>
        <artifactId>h2o-genmodel</artifactId>
        <version>3.30.0.2</version>
    </dependency>
</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <version>3.6.1</version>
            <configuration>
                <source>11</source>
                <target>11</target>
            </configuration>
        </plugin>
        <!--package as one fat jar -->
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-assembly-plugin</artifactId>
            <version>2.5.2</version>
            <configuration>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <archive>
                    <manifest>
                        <addClasspath>true</addClasspath>
                        <mainClass>${exec.mainClass}</mainClass>
                    </manifest>
                </archive>
            </configuration>
            <executions>
                <execution>
                    <id>assemble-all</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

In particular, ND4J needs this file or it won’t load: https://github.com/eclipse/deeplearning4j/blob/master/nd4j/nd4j-backends/nd4j-backend-impls/nd4j-native/src/main/resources/nd4j-native.properties

is it supposed to be created by any of the maven dependencies ?