Hello, I’ve checked out multiple threads pertaining to the same issue (having been 3 years dated), yet it appears that regardless of the data set I attempt to input as well as the tuning I do, the classifier continues to skip classes at random.
I decided to create a classifier identical to the IrisClassifier that can be found in the DL4J examples repository:
with the following modifications:
-changing the iterations
-changing the activation functions
-updating the learning rate
-removing/modifying l2 regularization
-changing the loss function
-the amount of data that is used for training vs. testing
From what I’ve tried, nothing allows a run where all classes are considered. I’ve hit a wall and have absolutely no idea how to continue. Any help is greatly appreciated.
@chapipo could you clarify what version you’re using? Tuning results vary WILDLY form any individual neural network, dataset, and version of the library.
Thank you for your reply. The library was grabbed here using git bash and installed with Apache Maven 3.6.3 following this tutorial. The IrisClassifier works perfectly normal with very good results.
Thank you for the tuning links. I’ve checked it out and here are the changes I’ve tried:
-reducing the hidden layers to 1
-adding a stochastic gradient descent, changing the learning rate to 1e-3 and 1e-4
-10, 100, or 1000 epochs
-different amount of training data used
-attempting gradient normalization
-changing the loss function
-changing the activation function
Unfortunately, it seems like the data either only recognizes each class standalone, or two at a time.
I had already had this data normalization implemented:
I apologize if I’m missing something or if I’m slow to learn; I’m still a greenhorn at neural networks. Is there anything else I can try?
@chapipo lower your learning rate.
Get rid of your l2.
Change your optimizer/updater to adam.
Lastly, how big is your dataset?
If the whole thing fits in memory then our default minibatch knob (where we normalize the gradients by the batch size) prevents the network from fully learning.
That’s a reasonable default for most reasonable problems but not for toy problems when people first get started with something like 100 examples or something.
Set minibatch(false) at the top part of the configuration if that’s the case.
@chapipo just to get rid of some variables try to simplify your training pipeline a bit. Remove the split test and train and see if your model can learn from the whole dataset.
This is mainly to see if your model can overfit first.
One other question I have…what’s the breakdown of your dataset? It could be imbalanced classes. Sometimes you spend hours tuning just to realize your dataset is the problem.
Often times people dont’ spend enough time ensuring their datasets actually have signal.
If your model can overfit at least it can learn something.
Beyond that I would suggest potentially using the UI and visualizing the training as well. UI examples here:
I observe my dataset is imbalanced and skewed towards one specific class, is there anything within the neural network I can do to get around this limitation?
Thank you for the UI visualization tools, I will take a look at them shortly.
@chapipo that’s definitely your problem then.Could you give a breakdown of your label distribution? For under represented classes you’ll want to generate a dataset using something like over or under sampling(basically either balance your dataset or repeatedly sample from the underrepresented label)
If that doesn’t work then weighted loss functions also come to mind. This example should help:
Heavily skewed, in fact, but unfortunately this is the only data set I can gather from the source I want to test:
42/1097 = 0
47/1097 = 1
1008/1097 = 2
I have followed your advice on repeatedly using the same sample from the underrepresented classes, and: