Training a CNN with CNN2DFormat.NHWC causes a crash with exit code -1073741819 (0xC0000005)

Fabian · August 11, 2020, 6:30pm

Hi,

I’m trying to train a convolutional neural network (using deeplearning4j-core and nd4j-native-platform both in version 1.0.0-beta7) where the input data is formatted in CNN2DFormat.NHWC. However, I’ve run into some issue where training the network causes my application to crash. Here’s a small self-contained example (written in Kotlin) to illustrate the problem I’m currently facing:

import org.deeplearning4j.nn.conf.CNN2DFormat
import org.deeplearning4j.nn.conf.NeuralNetConfiguration
import org.deeplearning4j.nn.conf.inputs.InputType
import org.deeplearning4j.nn.conf.layers.ConvolutionLayer
import org.deeplearning4j.nn.conf.layers.DenseLayer
import org.deeplearning4j.nn.conf.layers.LossLayer
import org.deeplearning4j.nn.conf.layers.SubsamplingLayer
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.factory.Nd4j

fun main() {
    val format = CNN2DFormat.NHWC
    val convolutionActivation = Activation.RELU
    val denseActivation = Activation.SIGMOID

    val configuration = NeuralNetConfiguration.Builder()
        .list(  
            convolutionLayer(3, 10, intArrayOf(7, 3), convolutionActivation, format),
            maxPoolLayer(intArrayOf(1, 3), format),
            convolutionLayer(10, 20, intArrayOf(3, 3), convolutionActivation, format),
            maxPoolLayer(intArrayOf(1, 3), format),
            denseLayer(256, denseActivation),
            denseLayer(1, denseActivation),
            LossLayer.Builder().build()
        )
        .setInputType(InputType.convolutional(15, 80, 3, format))
        .build()

    val net = MultiLayerNetwork(configuration)
    net.init()

    val inputData = when (format) {  
        CNN2DFormat.NCHW -> Nd4j.create(256, 3, 15, 80)  
        CNN2DFormat.NHWC -> Nd4j.create(256, 15, 80, 3)  
    }  
    val desiredOutput = Nd4j.create(256, 1)  
  
    println("Inferring network output for some given inputs...")  
    net.output(inputData)  
  
    println("Training the network with some data...")  
    net.fit(inputData, desiredOutput)  
  
    println("Done!")  
}  
  
private fun convolutionLayer(nIn: Int, nOut: Int, kernelSize: IntArray, activation: Activation, format: CNN2DFormat): ConvolutionLayer =
    ConvolutionLayer.Builder()
        .nIn(nIn)
        .nOut(nOut)
        .kernelSize(*kernelSize)
        .activation(activation)
        .dataFormat(format)
        .build()

private fun maxPoolLayer(kernelSize: IntArray, format: CNN2DFormat): SubsamplingLayer =
    SubsamplingLayer.Builder()
        .kernelSize(*kernelSize)
        .stride(*kernelSize)
        .dataFormat(format)
        .build()

private fun denseLayer(nOut: Int, activation: Activation): DenseLayer =
    DenseLayer.Builder()
        .nOut(nOut)
        .activation(activation)  
        .build()

When running this example, I’d expect the following outputs being made to the console:

Inferring network output for some given inputs…
Training the network with some data…
Done!

However, when running the example, I only got the following output before the application crashed. Please note that this is all of the logs I’ve got. I’ve executed the example multiple times but did not see any further stacktrace or crash log.

Inferring network output for some given inputs…
Training the network with some data…

Process finished with exit code -1073741819 (0xC0000005)

While playing around with the example (i.e. changing the values offormat, convolutionActivation and denseActivation), I’ve noticed that the crash (at least on my machine) only occurs when format is set to CNN2DFormat.NHWC and convolutionActivation is set to either Activation.RELU or Activation.RRELU. If I change format to CNN2DFormat.NCHW or if I change convolutionActivation to anything else but Activation.RELU or Activation.RRELU, everything runs as expected.

Over on GitHub I’ve found this memory related issue, this issue related to a crashing test which is already fixed and another issue related to a crashing benchmark which is also fixed (I can’t add a third link in this post, but the issue ID is 4895). While these issues are all related to exit code -1073741819, they don’t quite seem to match the problem I’m facing. Regarding the fact that I’m still a beginner in using DL4J (and therefore can’t really tell whether this crash is due to my fault or due to some bug somewhere), I thought it might be better to ask for help here on the forums instead of opening a new issue on GitHub. There clearly must be something I’m getting wrong here. Is there some limitation regarding the supported activation functions when using CNN2DFormat.NHWC? Is it generally speaking a non-ideal solution to use CNN2DFormat.NHWC instead of the default format (CNN2DFormat.NCHW)? Thanks for any help or suggestions in advance.

agibsonccc · August 13, 2020, 8:57am

Do you have the native crash log anywhere? Generally it looks like hs_pid.log
It will give you the absoluate path of that crash log in the error.
Regardless, that should be a java exception not a jvm crash.

Fabian · August 13, 2020, 10:28am

I can’t find any native crash log at all. So far I’ve tried running the example from within my IDE (i.e. IntelliJ) and by executing it from a built JAR file. As I couldn’t find any crash log neither in the current working directory nor in my temporary directory, I also tried specifying a path to where a possible native crash log should be saved (see the attached screenshot below). However, no log file was created under the path I specified.

sparse · April 5, 2021, 9:02pm

Hello,

I am facing the same error:

Process finished with exit code -1073741819 (0xC0000005)

I can’t upload the log file for some reason, I am sharing it below instead:

Using DL4J 1.0.0-beta7 and nd4j-native as the nd4j backend.

Thanks for your time.

agibsonccc · April 6, 2021, 2:36am

@sparse can you use the latest version and see what it does?
https://deeplearning4j.konduit.ai/config/config-snapshots

sparse · April 6, 2021, 9:16am

Using the latest version seems to fix the issue.

However when enabling avx2 the following error is produced:

Cannot resolve org.nd4j:nd4j-native:1.0.0-SNAPSHOT

And the error persists when switching the version of nd4j backend alone to beta7

agibsonccc · April 9, 2021, 5:51am

@sparse thanks for the feedback. Let me look in to this. Ping me again sometime next week just in case you don’t see feedback.

Topic		Replies	Views
Error -1073740791 (0xC0000409) when using Convolution1D with CUDA DL4J	3	74	May 20, 2024
Error in training -> Invalid input: expect CNN activations with rank 4 DL4J	3	233	July 29, 2023
Temporal Convolutional Network DL4J	11	883	February 6, 2020
Training CNN error / CNN text classification DL4J	59	2450	May 15, 2021
Imported Pretrained Keras Model Produces Wrong Output in DL4J DL4J	4	1480	February 26, 2020

Training a CNN with CNN2DFormat.NHWC causes a crash with exit code -1073741819 (0xC0000005)

Related topics