Best Bet for Clustering

pilldom · December 6, 2020, 6:20pm

Hey, I’ve been working on a few different networks and practicing with the examples and I think I am getting the overall flow.

However, I’m having trouble finding what type of network/example is the best fit for what I’m hoping to do.

I have a csv file in which I am hoping to cluster the data using ANN. My problem is, that what I am trying to do is not particularly applicable to typical clustering algorithms like K-means, since I don’t have a predefined number of clusters. As such, I don’t have a predetermined “label” for each row.

Is this going to essentially nullify any benefit of deep learning, since it removes the networks ability to properly test the data set afterwards? Is my best bet just to use something like K-means and then experiment with different numbers of clusters?

treo · December 7, 2020, 8:42am

If your data is clustering on its own already, something like DBSCAN (not implemented in DL4J) may be something you could try.

If your data doesn’t appear to be particularly clustered in its original domain, you should first try PCA on it.

If you want to use deep learning for clustering, you are probably looking for some kind of dimensionality reduction. This means that you should look into using autoencoders to transform your high dimensional data into something lower dimensional.

But then you are still faced with having to use an actual clustering algorithm on it. K-Means has become somewhat of a default baseline, because it is easy to explain and teach, but there are other approaches too.

For more on this, take a look at the following to articles, which were the two top results when googling for cluster algorithms:

Topic		Replies	Views
Unsupervised - starting point DL4J	3	891	March 30, 2020
Implementing a simple k-means clustering model DL4J	1	306	April 21, 2023
Question about DL4J DL4J	2	456	June 1, 2020
Unsupervised pretraining with autoencoder and supervised training DL4J	2	365	July 8, 2020
Neo4J graph database DL4J	7	208	April 25, 2023

Best Bet for Clustering

Related topics