Sample data sets

adonnini · February 8, 2023, 1:58pm

Hi,
I am a complete newiey wrt DL4J. I have been looking for example datasets without success.
I am looking to use DL4J for a location tracking prediction application. My datasets consists of latitude,longitude,time entries.
Could someone kindly point me to a location where I could find sample datasets for DL4J applications?
Thank you,
Alex Donnini

agibsonccc · February 9, 2023, 6:47am

@adonnini could you clarify what you mean? Datasets aren’t usually a part of a framework.
There are certainly pre done datasets for basic problems like images but a framework won’t have every dataset you need. That’s up to the end user normally.

If you mean how work with certain kinds of data, find a sample dataset that reflects what you want to do (sometimes on sites like kaggle or from research papers) and we can teach you how to work with that dataset.

adonnini · February 9, 2023, 7:49pm

Hi Adam,

Thanks for getting back to me.

I have an Android application which tracks user location and runs mobile
analytics. Unlike most (all?) other similar application no user location
information and user mobility profiling leaves the device. All
processing is on-device.

I would like to add a movement prediction function.

I think this would be a good application for a neural net.

The data I work with has a simple structure. It consists of rows with
the following information:

latitude,longitude,elevation,time

Later on, I would probably include additional location related information.

I have been looking for a framework which I could use to try a number of
different approaches and which supports Android.

At this point in time, I think DL4J seems to offer the best fit for my
needs.

I asked the question about datasets because I wanted to take a look at
the kinds of datasets used for DL4J application.

However, this may not be necessary or very useful since I do have a
dataset with a well defined format.

My question is could I use my dataset as is to try and run through some
of the neural nets supported by DL4J? Or, is this premature?

Clearly, just picking a neural net at random and try to run my dataset
through it probably does not make much sense.

If it’s not asking too much, I would appreciate guidance on how I should
proceed.

Thanks,

Alex Donnini

agibsonccc · February 10, 2023, 7:29am

@adonnini usually LSTMs or simple dense layers should do the trick then. I would start by using a csv example and converting your sensor output to that and seeing if you can build a model.
This might work for you for a starting point:

github.com

deeplearning4j/deeplearning4j-examples/blob/686db99fee3d4825ee70663e1a15aa8d6216f2c2/dl4j-examples/src/main/java/org/deeplearning4j/examples/quickstart/modeling/recurrent/UCISequenceClassification.java#L112


      
          private static File featuresDirTrain = new File(baseTrainDir, "features");
          private static File labelsDirTrain = new File(baseTrainDir, "labels");
          private static File baseTestDir = new File(baseDir, "test");
          private static File featuresDirTest = new File(baseTestDir, "features");
          private static File labelsDirTest = new File(baseTestDir, "labels");
          
          
public static void main(String[] args) throws Exception {
              downloadUCIData();
          
          
    // ----- Load the training data -----
              //Note that we have 450 training files for features: train/features/0.csv through train/features/449.csv
              SequenceRecordReader trainFeatures = new CSVSequenceRecordReader();
              trainFeatures.initialize(new NumberedFileInputSplit(featuresDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));
              SequenceRecordReader trainLabels = new CSVSequenceRecordReader();
              trainLabels.initialize(new NumberedFileInputSplit(labelsDirTrain.getAbsolutePath() + "/%d.csv", 0, 449));
          
          
    int miniBatchSize = 10;
              int numLabelClasses = 6;
              DataSetIterator trainData = new SequenceRecordReaderDataSetIterator(trainFeatures, trainLabels, miniBatchSize, numLabelClasses,
                  false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

I would strongly recommend understanding the basics of ML though. It sounds like you might be a complete beginner. Try to see if building a model is possible as well as making sure it’s deployable.
Models that take too long to run on different android phones might not work out for you.

Time series data like this is fairly noisy and may not work well.

adonnini · February 10, 2023, 12:30pm

Hi Adam,

In my app, to date I have been using clustering (DBScan, KMeans++) with
various distance metrics. I have also developed my own algorithms for
(simple) ML.

As you guessed, I am a complete beginner wrt neural nets. I have been
reading a number of papers. At a high level, I think I understand the
overall process of how a neural net works.

I am still trying to understand the processing that take place when
information transits inside a node. In my case, when information arrives
at a node, in addition to the actions related to the neural net
information flow, an algorithm (I have my own location information
processing algorithms) needs to process the location information.
Somehow, I think I am missing something and/or misunderstanding
something pretty fundamental regarding how neural nets work.

Thanks very much for giving me a start. I really appreciate it

agibsonccc · February 10, 2023, 12:38pm

@adonnini the main thing to understand is the idea of vectorization. Neural nets (at least dense layers) are no different than very fancy regression algorithms.

If you pretend that a dense layer is a logistic regression then you’d see that the feature extraction would be similar.

Usually for your data you have to do various pre processing like:

scaling zero to 1 or by zero mean unit variance
Encoding categorical data as a set of features with 0 to 1 (one hot encoding)
encoding some features with discretization (converting continuous to a single 0/1 outcome or something similar)

There’s a lot more tricks (sometimes people use more advanced tricks like embeddings)

So for your algorithms (don’t worry I don’t want your secret sauce!) if you can use the more common tricks to turn those algorithms in to features you’ll do great.

Note there’s an additional layer of complexity with hyper parameter tuning to consider (learning rates, number of layers, loss functions,…)

Some of this stuff has conventions but some of it you just have to react to the way the training is going.

That tuning is the kind of tuning you usually do if you work with logistic regression. (The learning rate being the main parameter to start with there)

adonnini · February 10, 2023, 6:03pm

Hi Adam,

This is very helpful. It clarifies things quite a bit. Thanks!

Now, I have a pretty clear path, and to-do list.

I hope you won’t mind if I touch base with you again in future, if I
have any questions. I’ll do my best to keep questions to a minimum.

Thanks,

Alex

agibsonccc · February 10, 2023, 9:33pm

@adonnini sure just post here! Everyone benefits from public posts!

Topic		Replies	Views
Quickstart using GPS trajectories file from UCI DL4J	109	1071	May 15, 2023
Deeplearning4J Animal Classifier DL4J	7	431	June 2, 2021
DL4J Need help with my input data DL4J	5	460	December 25, 2020
LSTM Regression Example DL4J	11	1158	January 13, 2022
DL4J for wear OS Feature Request	4	661	June 26, 2020

Sample data sets

Related topics