Autoencode sanity check

Hello there.

I am new to DL4J. I am trying to create an autoencoder, and had a look at several examples. I have a dataset that has 11 doubles as input. I would like to create an autoencoder that reduces the 11 features to a code of size 2 or 3 for nice visualisations. Before doing that, I was thinking of a sanity check, and using a code of size 11. So I have a dense multi-layer network with 5 layers, each having 11 nodes. This means that the input should rather simply be passed straight through, and a perfect score should be obtainable. I have tried many different settings, and nothing made DL4J learn this solution. I am seeking for help and ideas so that I can do this sanity check. After that I want to reduce the code size.

Here are some things I have tried.

  • None
  • NormalizerStandardize
  • NormalizerMinMaxScaler

Batch size

  • 64


  • 30 + but there is no convergence after only a couple of epochs

Weight Initializations

  • Identity
  • RELU
  • Xavier


  • AdaGrad(0.05)
  • Adam(0.05)

Activation functions

  • Identity
  • Relu
  • Sigmoid


  • Sotchastic gradient descent
  • line gradient descent


  • 0.0001
  • disabled


  • 0-3 hidden layers

Loss function

  • MSE
  • MAE

I have tried many of the above combinations, but it just will not converge to anything. The error stays pretty much the same after weight initialization. When feeding the inputs through the network, I see that the response is somewhat, but not very similar to the original input. It should be really similar, as not encoding should be occuring. I also see that some internal nodes have an activation of 0 (depending on the settings).

I have no idea what I am doing wrong. It seems I have tried enough variants. Is there maybe something wrong with my custom data iterator? When inspecting the data set, I nicely see the same numbers as features and labels.

Please help solving this mystery. Thank you very much.

Here is the code, for reference.

public class CAMSTAMUnsupervised {

private static int trainBatchSize = 64;
private static int testBatchSize = 1;
private static int numEpochs = 30;

public static String dataLocalPath;

public static void main(String[] args) throws Exception {

    File modelFile = new File(dataLocalPath, "camstam.gz");
    DataSetIterator trainIterator = new CAMSTAMDataSetIterator(new File(dataLocalPath, "camstam_with_hr_features.csv").getAbsolutePath(), trainBatchSize);
    DataSetIterator testIterator = new CAMSTAMDataSetIterator(new File(dataLocalPath,"camstam_with_hr_features.csv").getAbsolutePath(), testBatchSize);
    System.out.println("Input Columns: "+trainIterator.inputColumns());
    System.out.println("Output Columns: "+trainIterator.totalOutcomes());
    MultiLayerNetwork net = createModel(trainIterator.inputColumns(), trainIterator.totalOutcomes());
    UIServer uiServer = UIServer.getInstance();
    StatsStorage statsStorage = new InMemoryStatsStorage();
    DataSet dst =;
    //DataNormalization normalizer = new NormalizerStandardize();
    DataNormalization normalizer = new NormalizerMinMaxScaler();;              //Collect training data statistics
    testIterator.setPreProcessor(normalizer);	//Note: using training normalization statistics
    NormalizerSerializer.getDefault().write(normalizer, new File(dataLocalPath, "anomalyDetectionNormlizer.ty").getAbsolutePath());
    // training
    net.setListeners(new StatsListener(statsStorage), new ScoreIterationListener(10));, numEpochs);

    //Sanity check
      while (testIterator.hasNext()) {
        DataSet ds =;
        List<INDArray> result = net.feedForward(ds.getFeatures());

public static MultiLayerNetwork createModel(int inputNum, int outputNum) {
	MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
		    //.updater(new AdaGrad(0.05))
		    .updater(new Adam(0.05))
		    .layer(0, new DenseLayer.Builder().nIn(inputNum).nOut(11)
		    .layer(1, new DenseLayer.Builder().nIn(11).nOut(11)
		    .layer(2, new DenseLayer.Builder().nIn(11).nOut(11)
		    .layer(3, new OutputLayer.Builder().nIn(11).nOut(outputNum)
    MultiLayerNetwork net = new MultiLayerNetwork(conf);
    return net;

I have found the problemn. I assumed that the .activation function on the MultiLayerConfiguration.Builder is the default for all the activation layers. It is not however. After setting each activation separately for each layer with the identity activation, it works perfectly. With identity weight initialization the solution is directly there; and with other weight initializations the optimal weights are being learned. The output can be reconstructed nicely. Here is not my network configuration for reference (reducing 11 dimensions to 10).

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
		    .updater(new Adam(0.5))  
		    .layer(0, new DenseLayer.Builder().nIn(inputNum).nOut(11).activation(Activation.IDENTITY)
		    .layer(1, new DenseLayer.Builder().nIn(11).nOut(10).activation(Activation.IDENTITY)
		    .layer(2, new DenseLayer.Builder().nIn(10).nOut(11).activation(Activation.IDENTITY)
		    .layer(3, new OutputLayer.Builder().nIn(11).nOut(outputNum).activation(Activation.IDENTITY)