Hi,
I’m trying to train a convolutional neural network (using deeplearning4j-core and nd4j-native-platform both in version 1.0.0-beta7) where the input data is formatted in CNN2DFormat.NHWC
. However, I’ve run into some issue where training the network causes my application to crash. Here’s a small self-contained example (written in Kotlin) to illustrate the problem I’m currently facing:
import org.deeplearning4j.nn.conf.CNN2DFormat
import org.deeplearning4j.nn.conf.NeuralNetConfiguration
import org.deeplearning4j.nn.conf.inputs.InputType
import org.deeplearning4j.nn.conf.layers.ConvolutionLayer
import org.deeplearning4j.nn.conf.layers.DenseLayer
import org.deeplearning4j.nn.conf.layers.LossLayer
import org.deeplearning4j.nn.conf.layers.SubsamplingLayer
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.nd4j.linalg.activations.Activation
import org.nd4j.linalg.factory.Nd4j
fun main() {
val format = CNN2DFormat.NHWC
val convolutionActivation = Activation.RELU
val denseActivation = Activation.SIGMOID
val configuration = NeuralNetConfiguration.Builder()
.list(
convolutionLayer(3, 10, intArrayOf(7, 3), convolutionActivation, format),
maxPoolLayer(intArrayOf(1, 3), format),
convolutionLayer(10, 20, intArrayOf(3, 3), convolutionActivation, format),
maxPoolLayer(intArrayOf(1, 3), format),
denseLayer(256, denseActivation),
denseLayer(1, denseActivation),
LossLayer.Builder().build()
)
.setInputType(InputType.convolutional(15, 80, 3, format))
.build()
val net = MultiLayerNetwork(configuration)
net.init()
val inputData = when (format) {
CNN2DFormat.NCHW -> Nd4j.create(256, 3, 15, 80)
CNN2DFormat.NHWC -> Nd4j.create(256, 15, 80, 3)
}
val desiredOutput = Nd4j.create(256, 1)
println("Inferring network output for some given inputs...")
net.output(inputData)
println("Training the network with some data...")
net.fit(inputData, desiredOutput)
println("Done!")
}
private fun convolutionLayer(nIn: Int, nOut: Int, kernelSize: IntArray, activation: Activation, format: CNN2DFormat): ConvolutionLayer =
ConvolutionLayer.Builder()
.nIn(nIn)
.nOut(nOut)
.kernelSize(*kernelSize)
.activation(activation)
.dataFormat(format)
.build()
private fun maxPoolLayer(kernelSize: IntArray, format: CNN2DFormat): SubsamplingLayer =
SubsamplingLayer.Builder()
.kernelSize(*kernelSize)
.stride(*kernelSize)
.dataFormat(format)
.build()
private fun denseLayer(nOut: Int, activation: Activation): DenseLayer =
DenseLayer.Builder()
.nOut(nOut)
.activation(activation)
.build()
When running this example, I’d expect the following outputs being made to the console:
Inferring network output for some given inputs…
Training the network with some data…
Done!
However, when running the example, I only got the following output before the application crashed. Please note that this is all of the logs I’ve got. I’ve executed the example multiple times but did not see any further stacktrace or crash log.
Inferring network output for some given inputs…
Training the network with some data…Process finished with exit code -1073741819 (0xC0000005)
While playing around with the example (i.e. changing the values offormat
, convolutionActivation
and denseActivation
), I’ve noticed that the crash (at least on my machine) only occurs when format
is set to CNN2DFormat.NHWC
and convolutionActivation
is set to either Activation.RELU
or Activation.RRELU
. If I change format
to CNN2DFormat.NCHW
or if I change convolutionActivation
to anything else but Activation.RELU
or Activation.RRELU
, everything runs as expected.
Over on GitHub I’ve found this memory related issue, this issue related to a crashing test which is already fixed and another issue related to a crashing benchmark which is also fixed (I can’t add a third link in this post, but the issue ID is 4895). While these issues are all related to exit code -1073741819, they don’t quite seem to match the problem I’m facing. Regarding the fact that I’m still a beginner in using DL4J (and therefore can’t really tell whether this crash is due to my fault or due to some bug somewhere), I thought it might be better to ask for help here on the forums instead of opening a new issue on GitHub. There clearly must be something I’m getting wrong here. Is there some limitation regarding the supported activation functions when using CNN2DFormat.NHWC
? Is it generally speaking a non-ideal solution to use CNN2DFormat.NHWC
instead of the default format (CNN2DFormat.NCHW
)? Thanks for any help or suggestions in advance.