Advice on proper use of Dataset labels

atom2galaxy · March 19, 2020, 4:01pm

Hello,

Per documentation in DataSet class, labels “… should be binarized label matrices such that the specified label has a value of 1 in the desired column with the label)”.
I am wondering if it is OK to use DataSet in such way that the parameter INDArray “second” can contain INDArray of row vectors whose values are floats in (0,1) interval. In other words, instead of having row vectors such as [1.0,0,0,0,0] they can be something like [0.97, 0.08, 0.01, 0.02, 0.05].
Specifically, I am interested in whether or not the fit function would be able to properly calculate and backpropagate errors if labels are not in the [1.0,0,0,0,0] especially if I am using sigmoid activation function in each layer including the output layer. I am not looking to classify output, but rather build a non-linear model that would map features to output.

Thank you in advance!

Alexey

treo · March 19, 2020, 4:35pm

What you are looking to do is called “regression”, and yes, when doing regression (with the appropriate loss function), there is no problem with using labels that aren’t one-hot encoded

atom2galaxy · March 19, 2020, 4:54pm

@treo,

Awesome! Thank you very much for responding so quickly!
I am looking to scale the label data to be between 0 and 1 and use MEAN_SQUARED_LOGARITHMIC_ERROR for loss function. Does that sound reasonable to you?

treo · March 19, 2020, 4:55pm

Yes, the MSE variants are usually reasonable for regression problems.

atom2galaxy · March 19, 2020, 4:56pm

Perfect, thank you very much!

Topic		Replies	Views
Multi Label Sequence Classification Tuning Help	4	558	June 24, 2021
Per-sample weights or label fractions DL4J	5	613	October 21, 2021
Error: Labels and preOutput must have equal shapes DL4J	2	317	March 1, 2022
Incorrect regression output data DL4J	4	57	September 28, 2024
Regression instead of a classification DL4J	6	764	June 8, 2020

Advice on proper use of Dataset labels

Related topics