DatasetIterator for music generation


I’m making a neural network model for generating music. The model would be taught by trying to recreate existing compositions.

I’m trying to create a dataset iterator which would feed its data to an LSTM type network. The data consists of cca. 50 compositions.

Each composition is stored as a CSV file in which the rows represent timestamps (first row is the beginning of a composition and the last one is its end). In other words, each row represents the state of a piano keyboard for every 16th note.

Each row has 88 columns. The columns represent keys on a piano keyboard. The number of keys will be reduced if the data model proves to be too complex.

Each composition is basically a 2D array where element [i,j] is the state of j-th piano key at the i-th timestamp. 0 if the key is resting, 1 if it is pressed. Note that different compositions have different number of rows.

A simple melody could be written in CSV as following:

  1. 0, 0, 0, 1, 0, 0, 0… (dots represent 81 more zeroes)
  2. 0, 0, 0, 1, 0, 0, 0…
  3. 0, 0, 0, 0, 1, 0, 0…
  4. 0, 0, 0, 0, 1, 0, 0…
  5. 0, 0, 0, 0, 0, 1, 0…
  6. 0, 0, 0, 0, 0, 1, 0…

Note that more than one key can be pressed at a given timestamp which could be written as:

 0, 0, 1, 0, 1, 0, 1...

I would like to train the network on existing music by feeding it row 1 and trying to predict row 2. Then I’d feed it rows 1, 2 and try to predict row 3, and so on. Basically, the input dataset already holds the output dataset within itself.

This is how it looks like in pseudocode;

for(int i = 1; i < compositionLength; i++){
for(int j = 0; j < i; j++){
feed network with first i-1 timestamps;
predict i-th timestamp;
change network weights based on prediction;
“clean” network input;

Any suggestion how to create an iterator for this problem? I’ve been looking at the quickstart tutorials and eclipse examples for a while now but I’m not really sure how to accomplish this.
Also, I apologize if this question already came up, if so, I missed it.

Suggestions would be very, very appreciated!

What you are trying to do here is known as an auto-regressive problem, as you want to predict the next step from the previous steps.

I’ve talked about some of the details of setting up the data for such a problem in this post: Training data shapping - #2 by treo

When it comes to composing music with deeplearning, and even dl4j in particular, there is this example: deeplearning4j-examples/dl4j-examples/src/main/java/org/deeplearning4j/examples/wip/advanced/modelling/melodl4j at master · eclipse/deeplearning4j-examples · GitHub

It is reading Midi instead of the very simplified model that you propose, and unfortunately it isn’t working quite correctly.

Thank you for the references! Although I still am not quite clear on how to make the proposed dataset.

I will probably go for a simpler approach. Breaking the compositions into smaller files, each containing 15 step long timeseries and the “label” being the 16-th timestep.