Recommended method for finding archetypes or clusters

I wish to cluster users together in a database, with each user represented by a number of features that are both discrete and continuous in nature. The aim is to define a small number of archetypal “users” with specific set of features. All other users are then categorized as being similar to one or other of these archetypes. An important consideration is that I expect the features to have strong dependency structures, and I would like the method to be effective at making these explicitly visible.

For example say the features per user are:

  • gender (m/f)
  • location (one of 10 cities)
  • favorite color (red/green/blue).

Let’s say that we have N users and that favorite color is a R.V. dependent on gender and city. How are we to discover possible strong correlations with gender and/or location and favorite colors? There are a number of clustering techniques, from K-NN, k-means, matrix factorization, even PCA, but many seem to hide the underlying correlations that tie the users together.

Could anyone recommend suitable methods for this unsupervised learning task?

[heavily edited in an effort to revive and resolve]


Source : Link , Question Author : Community , Answer Author : Community

Leave a Comment