Weighted learning based on input data / feature

JayWoild · April 8, 2020, 5:40pm

Hello,

I’m looking to modify the training process by giving weight to an input data set based on a feature, i.e. making all female inputs 1.7 times as important.
So far I’ve come across the weighted loss function, which doesn’t help me since it determines weight based on the class/label rather than the feature.
A custom loss function has the same fault, as I can only access the labels and preOutput Arrays (as far as I can see).

Is there a way to weigh input, other than by transforming the input data (i.e. by multiplying the female inputs)?

Thanks in advance.

treo · April 8, 2020, 6:23pm

I’m not quite sure I can follow what you are trying to do. Making a feature more important isn’t something that you can properly do with any kind of machine learning, as all of them pretty much by definition try to use all the features you give them in order to match to your data the best they can.

If you just want to make sure that examples with a specific feature have a better (training) accuracy, then you can make them appear more often during the training - debalancing your training set effectively, if it was balanced beforehand.

What you are asking for has been asked for before: Per-example weights · Issue #2616 · eclipse/deeplearning4j · GitHub

But we currently don’t have any specific ways to do it other than manually.

JayWoild · April 8, 2020, 6:36pm

Sorry for the lack of clarity, I was looking for the second option - giving more weight to specific examples, without modifying the dataset.

By “doing it manually” you mean debalancing the training set, correct?

Thank you for your speedy reply.

treo · April 8, 2020, 6:51pm

Yes.

You have multiple options of how you can attack that:

You can just copy the data multiple times in your input data
You can wrap the record reader that reads your data and just emit your examples multiple times
You can wrap the DataSetIterator and create unbalanced DataSets based on the incoming DataSet

The first two options are likely to be easier to implement correctly.

But you’ve also got to remember that you might run into overfitting problems if you oversample data like that.

If you’ve got enough data, you can also try it the other way around: Discard examples that you’d like to weaken, thereby making the examples to want to strengthen appear more often relatively.

JayWoild · April 8, 2020, 6:58pm

Thanks, I hadn’t thought of manipulating the record reader. I’ll give that a try.

Topic		Replies	Views
Per-sample weights or label fractions DL4J	5	613	October 21, 2021
Custom Loss Function and Gradient DL4J	1	275	August 3, 2023
Methods for dropping out inputs (features) during post-fit Evaluation? DL4J	7	583	May 5, 2021
Basic deeplearning4j classification example DL4J	4	996	February 3, 2020
Saving trained neural nets in csv or txt format	17	915	April 14, 2020

Weighted learning based on input data / feature

Related topics