Clustering of mixed type data with R

I wonder whether it is possible to perform within R a clustering of data having mixed data variables. In other words I have a data set containing both numerical and categorical variables within and I’m finding the best way to cluster them. In SPSS I would use two – step cluster. I wonder whether in R can I find a similar techniques. I was told about poLCA package, but I’m not sure …

Answer

This may come in late but try klaR (http://cran.r-project.org/web/packages/klaR/index.html)

install.packages("klar")

It uses the non-hierarchical k-modes algorithm, which is based on simple matching as a distance function, so the distance δ between a variable m of two data points x and y is given by


\delta(x_m,y_m) = \begin{cases}
1 & x_m \neq y_m,\\
0 & \text{otherwise}
\end{cases}

There is a flaw with the package, that is if two data points have the same distance to a cluster-center, the first in your data is chosen as opposed to a random point, but you can easily modify the bit in the code.

To accommodate for mixed-variable clustering, you will need to go into the code and modify the distance function to identify numeric and non-numeric modes and variables.

Attribution
Source : Link , Question Author : Giorgio Spedicato , Answer Author : rightskewed

Leave a Comment