LSTM - All inputs yield the same output

Hello,

I’m trying to create a simple LSTM, with 2 input features and a timeseries length of 1. I’m having a strange issue however; after training the network, inputting test data yields the same, arbitrary result regardless of the input values. My code is shown below.

public class LSTMRegression {
	public static final int inputSize = 2,
							lstmLayerSize = 4,
							outputSize = 1;
	
	public static final double learningRate = 0.001;

	public static void main(String[] args) {
		int miniBatchSize = 29;
		
		MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .updater(new Sgd(learningRate))
                .list()
                .layer(0, new LSTM.Builder().nIn(inputSize).nOut(lstmLayerSize)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.IDENTITY).build())
                .layer(1, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.SIGMOID).build())
                .layer(2, new LSTM.Builder().nIn(lstmLayerSize).nOut(lstmLayerSize)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.SIGMOID).build())
                .layer(3, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MSE)
                        .weightInit(WeightInit.XAVIER)
                        .activation(Activation.IDENTITY)
                        .nIn(lstmLayerSize).nOut(outputSize).build())
                
                .backpropType(BackpropType.TruncatedBPTT)
                .tBPTTForwardLength(miniBatchSize)
                .tBPTTBackwardLength(miniBatchSize)
                .build();
		
		var network = new MultiLayerNetwork(conf);
		
		network.init();
		network.fit(getTrain());
		
		System.out.println(network.output(getTest()));
	}
	
	public static INDArray getTest() {
		double[][][] test = new double[][][]{
            {{20}, {203}},
            {{16}, {183}},
            {{20}, {190}},
            {{18.6}, {193}},
            {{18.9}, {184}},
            {{17.2}, {199}},
            {{20}, {190}},
            {{17}, {181}},
            {{19}, {197}},
            {{16.5}, {198}},
            ...
		};
		
		INDArray input = Nd4j.create(test);
		
		return input;
	}
	
	public static DataSet getTrain() {
		double[][][] inputArray = {
			{{18.7}, {181}},
			{{17.4}, {186}},
			{{18}, {195}},
			{{19.3}, {193}},
			{{20.6}, {190}},
			{{17.8}, {181}},
			{{19.6}, {195}},
			{{18.1}, {193}},
			{{20.2}, {190}},
			{{17.1}, {186}},
			...
		};
		
		double[][] outputArray = {
				{3750},
				{3800},
				{3250},
				{3450},
				{3650},
				{3625},
				{4675},
				{3475},
				{4250},
				{3300},
				...
		};
		
		INDArray input = Nd4j.create(inputArray);
		INDArray labels = Nd4j.create(outputArray);
		
		return new DataSet(input, labels);
	}
}

Here’s an example of that last line’s output:

[[[0.4380]], 

 [[0.4380]], 

 [[0.4380]], 

 [[0.4380]], 

 [[0.4380]], 

 [[0.4380]], 

 [[0.4380]], 

 [[0.4380]], 

 [[0.4380]], 

 [[0.4380]],
 ...
]


So far I’ve tried changing the updater (it was previously Adam), the activation function (previously it was ReLU), and the learning rate, all with similar results.

I’ve also previously normalized the data, which unsurprisingly didn’t change the fact that every output was the same. Because of this, I chose not to include the normalization for readability (however please let me know if you’d like to see the code and/or outputs with normalization).

Thank you.

@TwistedTea I’m already interacting with you on stack overflow. Sorry the system flagged your post as spam for some reason. I’ve removed that and will link your cross post here: java - LSTM in DL4J - All output values are the same - Stack Overflow

1 Like