Suggestions Thank you

MintakaB · July 19, 2025, 7:32pm

Dear,
I have an extensive application using DL4J and I have a few comments on the API to improve its integration. I am actually very happy with the performance and general approach to solve ML use cases
on time series.

Label Normalization Setup Is Unintuitive

The requirement to call fitLabel(true) before fit() is not obvious and is easy to miss, leading to runtime errors (NullPointerException).
If fitLabel(true) is called after fit(), label stats are not initialized, but no warning is given.
2. Dummy Dataset Construction Is Error-Prone

Users must manually construct a dummy dataset with min/max values for both features and labels, matching the exact shapes expected by the model.
Shape mismatches or missing label data silently result in null stats and runtime errors later.
3. Lack of Direct Min/Max Setters

There is no API to directly set min/max values for features and labels, which would be much more convenient for production workflows with large datasets.
4. Documentation Could Be Improved

The documentation should clearly state the order of method calls and provide examples for initializing the normalizer with custom min/max values for both features and labels.
5. Error Messages Could Be More Helpful

When label stats are not initialized, the error message could suggest checking the order of fitLabel(true) and fit() calls, or the shape of the dummy dataset.

agibsonccc · July 20, 2025, 10:21pm

@MintakaB thanks for the suggestions! Happy to address it in the next release. I’ve mainly been internals focused as of late. The c++ library needed a lot of modernization and it’s not something users really see. Please feel free to file an issue and I’ll mark it for improvement though!

One suggestion I have is if you’re not seeing something, it’s less than idea of course but please do take a look at the tests in the main repo.

Topic		Replies	Views
Test issue in semantic segmentation using Deep learning 4j DL4J	18	708	February 24, 2020
Questions for Time Series LSTM DL4J	20	2043	February 26, 2020
How to data normalization for INDArray/Array ND4J	4	1662	March 9, 2020
Question regarding the LSTM training and data format DL4J	3	559	April 28, 2021
Low accuracy compared to model trained with Keras DL4J	8	779	August 21, 2020

Suggestions Thank you

Related topics