Hyperparameters have no effect


As part of my MSc research project I’m using DL4J in conjunction with wekaDeepLearning4J (version 1.7.2) to perform a land classification project. I’m using the Dl4jMlpClassifier amd Dl4jResNet50 model.

I’m trying to “tune” the various hyperparameters such as those pertaining to early stopping, dropout, optimisation algorithm, etc. The issue I have is that none of them have any effect on the results. The only one that does is the mini-batch parameter set on the ImageInstanceIterator, which has pretty dramatic effects even with small changes.

Looking at the D4LJ code I came across this comment in org.deeplearning4j.ui.module.train.TrainModule class

//TODO: Maybe L1/L2, dropout, updater-specific values etc

Does the library support these hyperparameters, and if so why might my model seem to ignore them?

Thanks in advance!

@Chondron yes we do have L1/L2. Do you have some code I can look at? I"m not familiar with all the aspects of the wekadl4j project but I’m happy to take a look.

Hello. Thanks for replying so promptly. I’m not doing it in code - this was the next step if I couldn’t find a solution (which I’d rather avoid really). I’m setting up/calling the model via the command line/class. See below. I have around 1200 images and the L1 setting of 0.001 (in this instance) doesn’t change the output. Must admit, I’m more interesting in changing the optimization algorithm from SGD to LBFGS and the selecting Gaussian dropout so much as the L1/L2 regularisation factors. Are you suggesting these aren’t implemented as yet?

Thanks again.

weka.classifiers.functions.Dl4jMlpClassifier -S 1 -cache-mode MEMORY -early-stopping “weka.dl4j.earlystopping.EarlyStopping -maxEpochsNoImprovement 0 -valPercentage 0.0” -normalization “Standardize training data” -iterator “weka.dl4j.iterators.instance.ImageInstanceIterator -channelsLast false -height 224 -imagesLocation /home/adventure/MSc/_AAA_Research/data/test_dataset/all -numChannels 3 -width 224 -bs 8” -iteration-listener “weka.dl4j.listener.EpochListener -eval true -n 5” -layer “weka.dl4j.layers.OutputLayer -lossFn "weka.dl4j.lossfunctions.LossMCXENT " -nOut 2 -activation "weka.dl4j.activations.ActivationSoftmax " -name "Output layer"” -logConfig “weka.core.LogConfiguration -append true -dl4jLogLevel WARN -logFile /home/adventure/wekafiles/wekaDeeplearning4j.log -nd4jLogLevel INFO -wekaDl4jLogLevel INFO” -config “weka.dl4j.NeuralNetConfiguration -biasInit 0.0 -biasUpdater "weka.dl4j.updater.Sgd -lr 0.001 -lrSchedule \"weka.dl4j.schedules.ConstantSchedule -scheduleType EPOCH\"" -dist "weka.dl4j.distribution.Disabled " -dropout "weka.dl4j.dropout.Disabled " -gradientNormalization None -gradNormThreshold 1.0 -l1 0.001 -l2 NaN -minimize -algorithm STOCHASTIC_GRADIENT_DESCENT -updater "weka.dl4j.updater.Adam -beta1MeanDecay 0.9 -beta2VarDecay 0.999 -epsilon 1.0E-8 -lr 0.001 -lrSchedule \"weka.dl4j.schedules.ConstantSchedule -scheduleType EPOCH\"" -weightInit XAVIER -weightNoise "weka.dl4j.weightnoise.Disabled "” -numEpochs 10 -numGPUs 1 -averagingFrequency 10 -prefetchSize 24 -queueSize 0 -zooModel "weka.dl4j.zoo.Dl4jResNet50 -channelsLast false -pretrained IMAGENET"F

@Chondron apologies I don’t know either way. Could you tell me what weka version you’re using? I can try checking the source code to see what will/won’t work. Thanks!

It’s weka 3.9.6 and wekaDeepLearning4J 1.7.2 (both the most recent versions I think). It looks like the deepLearning4J backend is version 1.0.0-beta7 (judging by the jar files).


@Chondron beta7 is 2 years old. Those are way out of date. Let me take a look at the wekadeeplearning4j project though. From the looks of it the command line parameter is already there.
You may just be having normal tuning issues.

Could you elaborate more on your dataset and the like? I’d like to see if it’s possible to upgrade the dl4j version there as well.

Not sure what you mean by elaborate on the dataset. It’s 1200 224x224 images covering (roughly) 20x20m patches. I’m pretty sure the various hyperparameter settings I’ve experimented with should do something to the results.

The deepLearning4J jars are the ones that come with the wekaDeepLearning4J. I’ll see if i can get hold of newer jars and see if they don’t break the weka bits. Probably won’t be until tomorrow though now.


@Chondron I wonder if the weka code does any scaling…if it doesn’t then it may not matter how you tune. Going off the guide here: GitHub - Waikato/wekaDeeplearning4j: Weka package for the Deeplearning4j java library

You should be able to create your images a certain way. Ensure you can create them as scaled. (eg: normalize 0 to 1).

Unfortunately, I’ve no idea what you mean by this.

@Chondron I mean scaling your data zero to 1. Do you know if it does that before training the mdoel?

I’ve got the jars for 1.0.0-M2.1 - which appears to be the latest release? Seems to work (so far) so I’m having a play. Seems about 4x slower though, which is a bit concerning.

Sorry, I’m obviously being thick - but scaling my satellite images patches to between one and zero? I really don’t know what that refers to and I’ve not come across anything in the various hyperparameters that seems to match this. Do you have any reference to something that explains? Unless you’re referring to an activation function or something…


@Chondron pixel scaling. 0-255 → 0 to 1.
Almost every neural network has a form of image normalization built in to it to ensure the input data isn’t too large for a network to learn a pattern.

Cheers. I’ll expand my understanding by reading around this, and also try see what Weka does (I’d kind of hope it handles this sort of thing OK).

Actually, think about it I already knew that! Clearly having a slow neuron day.