libnd4jcpu frame issue

Issue Description

I’m trying to run the ImdbReviewClassificationRNN from the dl4j examples repo with a cpu configuration rather than gpu and am getting this error:

#
#  SIGSEGV (0xb) at pc=0x00007fb000df0e79, pid=29020, tid=0x00007faff8926700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_261-b25) (build 1.8.0_261-b25)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.261-b25 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libnd4jcpu.so+0x5bc0e79]  void functions::transform::TransformAny<float, float>::exec<simdOps::Assign<float, float> >(void const*, long long const*, void*, long long const*, void*, unsigned long, unsigned long)+0x1789

This has run successfully before an another host but fails on the current host I’m on and I’m not sure what’s causing it to fail.

Version Information

Please indicate relevant versions, including, if relevant:

  • Dl4J/ND4J version: 1.0.0-beta7

  • Java version: java version “1.8.0_261”
    Java™ SE Runtime Environment (build 1.8.0_261-b25)
    Java HotSpot™ 64-Bit Server VM (build 25.261-b25, mixed mode)

OS version:
NAME=“Red Hat Enterprise Linux Server”
VERSION=“7.8 (Maipo)”
3.10.0-1127.18.2.el7.x86_64

Additional Information

Core stack trace

#1  0x00007fb0d637ea78 in abort () from /lib64/libc.so.6
#2  0x00007fb0d5c66b59 in os::abort(bool) () from /efs/dist/oracle/jdk/1.8.0_261-build001/.exec/x86-64.rhel.6/jre/lib/amd64/server/libjvm.so
#3  0x00007fb0d5e2eb43 in VMError::report_and_die() () from /efs/dist/oracle/jdk/1.8.0_261-build001/.exec/x86-64.rhel.6/jre/lib/amd64/server/libjvm.so
#4  0x00007fb0d5c70e05 in JVM_handle_linux_signal () from /efs/dist/oracle/jdk/1.8.0_261-build001/.exec/x86-64.rhel.6/jre/lib/amd64/server/libjvm.so
#5  0x00007fb0d5c63ca8 in signalHandler(int, siginfo*, void*) () from /efs/dist/oracle/jdk/1.8.0_261-build001/.exec/x86-64.rhel.6/jre/lib/amd64/server/libjvm.so
#6  <signal handler called>
#7  0x00007fb000df0e79 in void functions::transform::TransformAny<float, float>::exec<simdOps::Assign<float, float> >(void const*, long long const*, void*, long long const*, void*, unsigned long, unsigned long) () from /<masked>/data/test/.javacpp/cache/dl4j-examples-1.0.0-beta7-shaded.jar/org/nd4j/nativeblas/linux-x86_64-avx2/libnd4jcpu.so
#8  0x00007faffbe5953d in std::_Function_handler<void (unsigned long, unsigned long), NativeOpExecutioner::execTransformAny(sd::LaunchContext*, int, void const*, long long const*, void const*, long long const*, void*, long long const*, void*, long long const*, void*, long long const*, long long const*, bool)::{lambda(unsigned long, unsigned long)#1}>::_M_invoke(std::_Any_data const&, unsigned long&&, std::_Any_data const&) ()
   from /<masked>/data/test/.javacpp/cache/dl4j-examples-1.0.0-beta7-shaded.jar/org/nd4j/nativeblas/linux-x86_64-avx2/libnd4jcpu.so
#9  0x00007fb00164ca41 in samediff::Threads::parallel_do(std::function<void (unsigned long, unsigned long)>, unsigned long) ()
   from /<masked>/data/test/.javacpp/cache/dl4j-examples-1.0.0-beta7-shaded.jar/org/nd4j/nativeblas/linux-x86_64-avx2/libnd4jcpu.so
#10 0x00007faffbe8b0ca in NativeOpExecutioner::execTransformAny(sd::LaunchContext*, int, void const*, long long const*, void const*, long long const*, void*, long long const*, void*, long long const*, void*, long long const*, long long const*, bool) ()
   from /<masked>/data/test/.javacpp/cache/dl4j-examples-1.0.0-beta7-shaded.jar/org/nd4j/nativeblas/linux-x86_64-avx2/libnd4jcpu.so
#11 0x00007faffbe90d15 in execTransformAny ()
   from /<masked>/data/test/.javacpp/cache/dl4j-examples-1.0.0-beta7-shaded.jar/org/nd4j/nativeblas/linux-x86_64-avx2/libnd4jcpu.so
#12 0x00007faffaf4fc2e in Java_org_nd4j_nativeblas_Nd4jCpu_execTransformAny__Lorg_bytedeco_javacpp_PointerPointer_2ILorg_nd4j_nativeblas_OpaqueDataBuffer_2Lorg_bytedeco_javacpp_LongPointer_2Lorg_bytedeco_javacpp_LongPointer_2Lorg_nd4j_nativeblas_OpaqueDataBuffer_2Lorg_bytedeco_javacpp_LongPointer_2Lorg_bytedeco_javacpp_LongPointer_2Lorg_bytedeco_javacpp_Pointer_2 () from /<masked>/data/test/.javacpp/cache/dl4j-examples-1.0.0-beta7-shaded.jar/org/nd4j/nativeblas/linux-x86_64-avx2/libjnind4jcpu.so
#13 0x00007fb0c1190bf9 in ?? ()
#14 0x0000000000000000 in ?? ()

Did you fix it? I encountered the similar issue.

Problematic frame:

C [libnd4jcpu.so+0x58b5df0] sd::TadDescriptor::TadDescriptor(long long const*, int const*, int, bool)+0x460

1 Like

Same problem here

A fatal error has been detected by the Java Runtime Environment:

SIGSEGV (0xb) at pc=0x00007fe3c75d9b10, pid=8, tid=169

JRE version: OpenJDK Runtime Environment (17.0.1+12) (build 17.0.1+12-39)

Java VM: OpenJDK 64-Bit Server VM (17.0.1+12-39, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)

Problematic frame:

C [libnd4jcpu.so+0x1743b10] sd::TadDescriptor::TadDescriptor(long long const*, int const*, int, bool)+0x420

While it is the same kind of issue, it isn’t necessarily the same issue.

That is a lot like saying that an exception has been thrown.

Please create a new thread and share more information with us about what you were doing.

@arnaud22 yes please do as @treo mentioned. Causes for this can vary quite a bit. Sometimes from multi threading all the way to a c++ issue. It’s hard to tell without reproducing it. Preferably you give us code we can run or at least the hs_err_pid_*.log somewhere in the directory where the crash happened.