Cant find out how to fix DL4JInvalidInputException

So I’m new to machine learning and I’ve tried to begin training a model but have been getting the org.deeplearning4j.exception.DL4JInvalidInputException error for a day or two.

I’ve tried setting builder.setInputType(InputType.recurrent(inputSize)); // inputSize being 5, as I have 5 features but I can’t figure it out, asked chatgpt and still couldnt figure it out. Looking around on forums and docs, still can’t figure it out. Please help me out, thanks in advance.

Here’s the code below (comically called robot):

    public void processData() {
        ObservableList<OHLCData> data = giveData();
        System.out.println("Data size: " + data.size());
    
        int timeStep = 50;  // Minimum timestep for LSTM
    
        List<double[][]> featureList = new ArrayList<>();
        List<double[]> labelList = new ArrayList<>();
    
        for (int i = 0; i < data.size() - timeStep; i++) {
            double[][] features = new double[timeStep][5];  // 5 features per timestep
            for (int j = 0; j < timeStep; j++) {
                features[j] = extractFeatures(data.get(i + j));
            }
            featureList.add(features);
            labelList.add(new double[]{data.get(i + timeStep).getClose()}); // Predicting the next close price
        }
    
        INDArray featureArray = Nd4j.create(featureList.toArray(new double[0][0][0]));
        INDArray labelArray = Nd4j.create(labelList.toArray(new double[0][0]));
    
    // Debugging: Print the shapes of the arrays before reshaping
    System.out.println("Before Reshaping:");
    System.out.println("Feature Array Shape: " + featureArray.shapeInfoToString());
    System.out.println("Label Array Shape: " + labelArray.shapeInfoToString());

    // Manually set the shape to ensure consistency
    featureArray = featureArray.reshape('c', featureArray.size(0), timeStep, 5); // Explicitly reshape to [batch, timeSteps, features]
    labelArray = labelArray.reshape('c', labelArray.size(0), 1, 1);  // [batch, timeStep=1, output]

    // Debugging: Check the shape after reshaping
    System.out.println("After Reshaping:");
    System.out.println("Feature Array Shape: " + featureArray.shapeInfoToString());
    System.out.println("Label Array Shape: " + labelArray.shapeInfoToString());

    
        DataSet dataset = new DataSet(featureArray, labelArray);
        normalizeData(dataset);
        trainModel(dataset);
    }

    private double[] extractFeatures(OHLCData data) {
        return new double[]{
            data.getOpen(),
            data.getHigh(),
            data.getLow(),
            data.getClose(),
            data.getVolume()
        };
    }

    private void trainModel(DataSet dataset) {
        int inputSize = 5;  // Matching the 5 extracted features
        int outputSize = 1; // Predicting the next close price

        // Configure the network
        NeuralNetConfiguration.ListBuilder builder = new NeuralNetConfiguration.Builder()
                .seed(123)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .list()
                .layer(0, new LSTM.Builder()
                        .nIn(inputSize)
                        .nOut(50)
                        .activation(Activation.TANH)
                        .build())
                .layer(1, new RnnOutputLayer.Builder()
                        .nIn(50)
                        .nOut(outputSize)
                        .activation(Activation.IDENTITY)
                        .lossFunction(LossFunctions.LossFunction.MSE)
                        .build());
    
 
        // Build and initialize the network
        MultiLayerNetwork net = new MultiLayerNetwork(builder.build());
        net.init();
        net.setListeners(new ScoreIterationListener(100));
    
        // Train the model
        ListDataSetIterator<DataSet> trainIterator = new ListDataSetIterator<>(dataset.asList(), 32);
        for (int epoch = 0; epoch < 10; epoch++) {
            net.fit(trainIterator);
            System.out.println("Epoch " + epoch + " complete");
        }
    
        System.out.println("Model training complete!");
        this.model = net;
    }
private void normalizeData(DataSet dataset) {
    // Min-Max Scaler (scales values between 0 and 1)
    DataNormalization normalizer = new NormalizerMinMaxScaler(0, 1);
    
    // Fit the normalizer using dataset statistics (min/max values)
    normalizer.fit(dataset);
    
    // Transform dataset (apply normalization)
    normalizer.transform(dataset);

    System.out.println("Data normalization complete.");
}

Below are some logs.
(error) - org.deeplearning4j.exception.DL4JInvalidInputException: Received input with size(1) = 50 (input array shape = [32, 50, 5]); input.size(1) must match layer nIn size (nIn = 5)

logs regarding reshaping -

Before Reshaping: Feature Array Shape: Rank: 3, DataType: DOUBLE, Offset: 0, Order: c, Shape: [9950,50,5], Stride: [250,5,1] Label Array Shape: Rank: 2, DataType: DOUBLE, Offset: 0, Order: c, Shape: [9950,1], Stride: [1,1]

After Reshaping: Feature Array Shape: Rank: 3, DataType: DOUBLE, Offset: 0, Order: c, Shape: [9950,50,5], Stride: [250,5,1] Label Array Shape: Rank: 3, DataType: DOUBLE, Offset: 0, Order: c, Shape: [9950,1,1], Stride: [1,1,1]

Data size log (confirmation that it has required data) -

Data size: 10000

What I’m trying to accomplish is training a model to predict market data with 10000 datapoints, use a timestep of 50 so the model can use 50 points to predict the next one(51), with 5 features(Open, High, Low, Close) for now as I’ve removed computed indicators for now.

I’m not a fantastic coder but I don’t rely on chatgpt for everything, I mainly use it for planning and researching as its great for aggregating information with their sources. Like I said I tried looking into the docs, I’ve attempted to educate myself on ML along with DL4J, but I’m just stuck.

@cho you need to use setInputType(…) similr to what’s in the examples. That will automatically set the number of inputs and outputs for each layer.

I think I’ve already tried that but still get the error.

        int inputSize = 5;  // Matching the 5 extracted features
        int outputSize = 1; // Predicting the next close price

        // Configure the network
        NeuralNetConfiguration.ListBuilder builder = new NeuralNetConfiguration.Builder()
                .seed(123)
                .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
                .list()
                .layer(0, new LSTM.Builder()
                        .nIn(inputSize)
                        .nOut(50)
                        .activation(Activation.TANH)
                        .build())
                .layer(1, new RnnOutputLayer.Builder()
                        .nIn(50)
                        .nOut(outputSize)
                        .activation(Activation.IDENTITY)
                        .lossFunction(LossFunctions.LossFunction.MSE)
                        .build());

                        builder.setInputType(InputType.recurrent(5));
    
 
        // Build and initialize the network
        MultiLayerNetwork net = new MultiLayerNetwork(builder.build());
        net.init();
        net.setListeners(new ScoreIterationListener(100));
    
        // Train the model
        ListDataSetIterator<DataSet> trainIterator = new ListDataSetIterator<>(dataset.asList(), 32);
        for (int epoch = 0; epoch < 10; epoch++) {
            net.fit(trainIterator);
            System.out.println("Epoch " + epoch + " complete");
        }
    
        System.out.println("Model training complete!");
        this.model = net;
    }
    

Regardless of putting builder.setInputType(InputType.recurrent(5));, I can’t get it to begin training the model.

This tells you the issue pretty much explicitly: You’ve got the timesteps and feature order mixed up.

If you take a closer look at the InputType.recurrent docs, you can see, that you can give it a couple more parameters.

In particular you can give it an RNNFormat parameter, which tells it what kind of order it should expect.

There are two options:

  • NCW: (minibatch, features, timesteps)
  • NWC: (minibatch, timesteps, features)

The abbreviations are derived from CNN related naming, so they are a little awkward in a recurrent setting.

I think this is definitely the right answer for this issue, but for me I’m guessing I’ve prepped the data wrong. I was reshaping the array and was getting passed this error only to end up with another one, resulting in me needing to reduce the dimensions using a GlobalPooling layer to get the model to begin training.

As a programmer, I don’t think I’m too bad. But I’ve definitely approached implementing a deep learning model the wrong way. I have some knowledge of the use-cases for ML vs DL, but I don’t think I’ve read enough to fully understand or wrap my head around how everything works and is put together.

So on that note, are there any resources you personally can recommend that helped you understand this topic??

Also sidenote, where can I change how many cores nd4j can use. I’ve changed the updater to nadam and implemented a dynamic learning rate schedule to avoid overfitting while changing the dropout of layers to 0.2 & 0.1, and can’t lie I definitely used chatgpt for this one, expecting the model to train much slower but I also noticed my cpu is underutilized sitting at around 30-40%.

.updater(new Nadam(new MapSchedule(ScheduleType.EPOCH, Map.ofEntries(
                    Map.entry(0, 0.0001),    // Warm-up
                    Map.entry(5, 0.0003),
                    Map.entry(10, 0.0005),   // Peak phase
                    Map.entry(30, 0.0008),
                    Map.entry(50, 0.001),    // Peak learning rate
                    Map.entry(75, 0.0008),   // Start gradual decay
                    Map.entry(100, 0.0005),
                    Map.entry(150, 0.0003),
                    Map.entry(200, 0.0001),  // Fine-tuning phase
                    Map.entry(250, 0.00005),
                    Map.entry(275, 0.00001),
                    Map.entry(300, 0.000005) // Final refinement```

There are two parts to this: Engineering and Math. The math aspect of it, can be learned from various sources. The engineering is a bit more related to the framework itself, there I can only suggest that you read the documentation and check out the JavaDoc aswell.

If you want to reduce the number of cores it uses, you can set the OMP_NUM_CORES environment variable. It controls how many cores OpenMP, which ND4J uses under the hood for parallelism, will use at most.

But I suppose you want to know how to get it to use more of your CPU. The reason you see underutilization is because you are using a recurrent model. Those models have an inherent bottleneck in the maximum parallelism they can use, because each timestep must be computed in sequence.

That is actually why transformers became a leading technology, because at least the input could be processed in parallel in architectures that use them.

1 Like