Modifying UCI Example

I think the big misunderstanding comes from the way the original data looks:
The original data is formatted as 600 rows.

At UCISequenceClassificationExample line 180 those 600 rows are split.

The for loop in line 193 to 198 then does two things:

  1. it transposes each row into a column by replacing the whitespace between the numbers with new lines
  2. it uses the fact that integer division results only in integer numbers, and that way creates a pair of column and label

Then the shuffle on line 201 shuffles those pairs, because the data is going to be read in linear order later on, and we want shuffled batches.

Finally, the for loop in line 206 to 222 writes the data to output files, when enough training data has been written it writes the test data. For each pair it writes two files:

  • train/features/#.csv contains several lines of numbers, with each line being a single timestep in the sequence with just a single feature
  • train/labels/#.csv contains just a single number on a single line, the label for the whole sequence

This is then later on read by CSVSequenceRecordReader and joined in such a way that the label aligns with the end of the sequence.

As far as I can tell, it works exactly as it should, and you have already achieved that goal.