Hi,
My code is failing with an error I do not understand. Please see below.
I searched for information on this error without much luck. Have you seen it before? Any suggestions on how to proceed to solve it?
Thanks
ERROR
[thread 4658 also had an error][thread 4661 also had an error][thread 4659 also had an error]
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f83d0d5dc77, pid=4569, tid=4660
#
# JRE version: OpenJDK Runtime Environment (20.0.2+9) (build 20.0.2+9-78)
# Java VM: OpenJDK 64-Bit Server VM (20.0.2+9-78, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [libnd4jcpu.so+0xd5dc77] void functions::transform::TransformAny<double, float>::exec<simdOps::Assign<double, float> >(void const*, long long const*, void*, long long const*, void*, unsigned long, unsigned long)+0x11d7
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/deeplearning4j-examples-master_1/dl4j-examples/hs_err_pid4569.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
@adonnini that could be anything. Can you tell me a bit more about it? It’s a native crash. That is a c++ segfault (like a java null pointer) chances are it’s an already fixed bug but I’d be happy to take a look.
Hi Adam,
If I disable enough print statements where I print .eval(), and
.eval(placeholderData), the error stops.
It started happening when I made some significant changes to the code
with regards to shapes of certain SDVariables, and model parameter sizes.
To give you an idea of what happens, here is an example of what happens
with an SDVariable called dropoutVar (nothing special about it, I could
as easily have used another SDVariable):
- If the following print statement is enabled
System.out.println(" EncoderLayer -
dropoutVar.eval().shapeInfoToString() - "+
dropoutVar.eval(placeholderData).shapeInfoToString());
the app crashes with the exit code 134
- If the above print statement is disabled, the code runs and when it
gets to this line
INDArray dropoutVarArray = dropoutVar.getArr();
it stops with the following error
Cannot invoke “org.nd4j.linalg.api.ndarray.INDArray.data()” because “x”
is null
- With the print statement still disabled if I disable this line
INDArray dropoutVarArray = dropoutVar.getArr();
and replace it with this line
INDArray dropoutVarArray = dropoutVar.eval(placeholderData);
the app crashes with the exit code 134
I hope this helps
Thanks,
Alex
Hi Adam,
Sorry to bother you. If I cannot resolve this latest problem I cannot
work on my app any longer. This problem is a real show stopper.
I know you are super busy. I would really appreciate it if you could
help me on this.
If this is a bug which has been fixed, how.where can I get the version
of the software with the fix?
Thank you,
Alex
@adonnini you’re welcome to build from source till I can get the builds up: Build From Source - Deeplearning4j
Beyond that, it might have something to do with reuse of variables and some being deleted at the native level.
Hi Adam,
Thanks. Sorry for bugging you again about this but you told me to ping
you about it once in a while. What’s the current ETA for the next release?
Frankly, based on past experience with a number of other packages if I
were to try and build from source it is likely that I would run into
issues which I would need to ask you about wasting your time. In my
opinion, not a smart way to go.
Forgive my next question. How likely do you think the problem is
resolved with the next release?
Thanks,
Alex
@adonnini based on commercial deployments highly likely. There were a number of race condiitons with deallocation that would sometimes happen that have since been fixed.I’m hoping to have it in the next few months but right now I’m still running in to a few issues with cuda yet.
I can look at getting snapshots up and running after this. What system are you potentially building for?
That being said, as usual I know almost nothing about your usage here. If code you wrote is suddenly causing this there might be a workaround.
Thanks for your response. I appreciate it.
-
My target architecture is AMD64. On my system I am running
Debian-Bookworm and using IntelliJ IDE community release latest stable
version
-
What are your rates for commercial support? If I were a commercial
support customer would I have access to “commercial deployment” versions
where the problme has been resolved? Or, did I misunderstand your "based
@adonnini send me an email at adam@konduit.ai and I’ll see what I can do for you.