Issue in datavec analysis

Hi Team

I am new to datavec and while following the quickstart Quickstart Dl4J, I am stuck in DataAnalysis part as:

Getting issue at line#21 -

On data(sample) -
fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
7.00,0.27,0.36,20.7,0.045,45,170,1.001,3,0.45,8.8,6
6.30,0.3,0.34,1.6,0.049,14,132,0.994,3.3,0.49,9.5,6
8.10,0.28,0.4,6.9,0.05,30,97,0.9951,3.26,0.44,10.1,6
7.20,0.23,0.32,8.5,0.058,47,186,0.9956,3.19,0.4,9.9,6
7.20,0.23,0.32,8.5,0.058,47,186,0.9956,3.19,0.4,9.9,6
8.10,0.28,0.4,6.9,0.05,30,97,0.9951,3.26,0.44,10.1,6
6.20,0.32,0.16,7,0.045,30,136,0.9949,3.18,0.47,9.6,6
7.00,0.27,0.36,20.7,0.045,45,170,1.001,3,0.45,8.8,6
6.30,0.3,0.34,1.6,0.049,14,132,0.994,3.3,0.49,9.5,6
8.10,0.22,0.43,1.5,0.044,28,129,0.9938,3.22,0.45,11,6

Please help!

Thanks and Regards

Due to restrictions on new users, I was unable to provide the Issue JPG in the last post, but here it is -

If I understand it correctly, you need to skip the first row of your data to make it work.
You have a file in which the first line is composed of labels (text) and not numbers.
So you need to tell your constructor (CSVRecordReader) that in the first line there is no useful data.

If you don’t skip the first line, the analyze method will try parsing the first row as well.
But your first row doesn’t contains numbers, only labels (text), so the parsing to number will fail.

Basically you need to do this:

val recordReader : CSVRecordReader = new CSVRecordReader(1)

The 1 means you are skipping the first row (documentation here).

P.S.
It is better if you use Github Gist for posting code.
It is very uncomfortable to read code from an image. :wink:

1 Like

Hi @StoicProgrammer

This code is in scala worksheet and just few lines of code, I thought can be posted directly.

Appreciate the help provided even though not being comfortable to read the code from image.

Thanks :blush:

Hi @StoicProgrammer

I was able to move forward, thanks for the suggestion. But after transforming all the columns as needed, I am getting error at line#58 as:


Please zoom in the image for reading the code.

Error -
java.util.NoSuchElementException: Unknown column: “quality”
at org.datavec.api.transform.schema.Schema.getIndexOfColumn(Schema.java:253)

Workaround -
I removed line#58 and solution works.

Please let me know the application of the line and fix for the code.

Thanks and Regards

The error here is the .categoricalToOneHot("quality") transform, as you run it, it turns your one column into 9 columns instead, with no single column being called “quality” anymore.

Because the RecordReaderDataSetIterator will do this transformation automatically, you can just remove it (e.g. line 49) and it should work.

Thanks @treo.

That let me knew the what the line does and the fix to my issue.