Weighted learning based on input data / feature

Hello,

I’m looking to modify the training process by giving weight to an input data set based on a feature, i.e. making all female inputs 1.7 times as important.
So far I’ve come across the weighted loss function, which doesn’t help me since it determines weight based on the class/label rather than the feature.
A custom loss function has the same fault, as I can only access the labels and preOutput Arrays (as far as I can see).

Is there a way to weigh input, other than by transforming the input data (i.e. by multiplying the female inputs)?

Thanks in advance.

I’m not quite sure I can follow what you are trying to do. Making a feature more important isn’t something that you can properly do with any kind of machine learning, as all of them pretty much by definition try to use all the features you give them in order to match to your data the best they can.

If you just want to make sure that examples with a specific feature have a better (training) accuracy, then you can make them appear more often during the training - debalancing your training set effectively, if it was balanced beforehand.

What you are asking for has been asked for before: Per-example weights · Issue #2616 · eclipse/deeplearning4j · GitHub

But we currently don’t have any specific ways to do it other than manually.

Sorry for the lack of clarity, I was looking for the second option - giving more weight to specific examples, without modifying the dataset.

By “doing it manually” you mean debalancing the training set, correct?

Thank you for your speedy reply.

Yes.

You have multiple options of how you can attack that:

  • You can just copy the data multiple times in your input data
  • You can wrap the record reader that reads your data and just emit your examples multiple times
  • You can wrap the DataSetIterator and create unbalanced DataSets based on the incoming DataSet

The first two options are likely to be easier to implement correctly.

But you’ve also got to remember that you might run into overfitting problems if you oversample data like that.

If you’ve got enough data, you can also try it the other way around: Discard examples that you’d like to weaken, thereby making the examples to want to strengthen appear more often relatively.

Thanks, I hadn’t thought of manipulating the record reader. I’ll give that a try.