I wonder whether it is possible to perform within R a clustering of data having mixed data variables. In other words I have a data set containing both numerical and categorical variables within and I’m finding the best way to cluster them. In SPSS I would use two – step cluster. I wonder whether in R can I find a similar techniques. I was told about poLCA package, but I’m not sure …

**Answer**

This may come in late but try klaR (http://cran.r-project.org/web/packages/klaR/index.html)

```
install.packages("klar")
```

It uses the non-hierarchical k-modes algorithm, which is based on simple matching as a distance function, so the distance δ between a variable *m* of two data points x and y is given by

\delta(x_m,y_m) = \begin{cases}

1 & x_m \neq y_m,\\

0 & \text{otherwise}

\end{cases}

There is a flaw with the package, that is if two data points have the same distance to a cluster-center, the first in your data is chosen as opposed to a random point, but you can easily modify the bit in the code.

To accommodate for mixed-variable clustering, you will need to go into the code and modify the distance function to identify numeric and non-numeric modes and variables.

**Attribution***Source : Link , Question Author : Giorgio Spedicato , Answer Author : rightskewed*