I understand how main difference between K-mean and Gaussian mixture model (GMM) is that K-Mean only detects spherical clusters and GMM can adjust its self to elliptic shape cluster. However, how do they differ when GMM has spherical covariance matrices?
Ok, we need to start off by talking about models and estimators and algorithms.
- A model is a set of probability distributions, usually chosen because you think the data came from a distribution like one in the set. Models typically have parameters that specify which model you mean from the set. I’ll write \theta for the parameters
- An estimator of a parameter is something you can compute from the data that you think will be close to the parameter. Write \hat\theta for an estimator of \theta
- An algorithm is a recipe for computing something from the data, usually something you hope will be useful.
The Gaussian mixture model is a model. It is an assumption or approximation to how the data (and future data, often) were generated. Data from a Gaussian mixture model tend to fall into elliptical (or spherical) clumps
k-means is an algorithm. Given a data set, it divides it into k clusters in a way that attempts to minimise the average Euclidean distance from a point to the centre of its clusters.
There’s no necessary relationship between the two, but they are at least good friends. If your data are a good fit to a spherical Gaussian mixture model they come in roughly spherical clumps centered at the means of each mixture component. That’s the sort of data where k-means clustering does well: it will tend to find clusters that each correspond to a mixture component, with cluster centres close to the mixture means.
However, you can use k-means clustering without any assumption about the data-generating process. As with other clustering tools, it can be used just to chop up data into convenient and relatively homogenous pieces, with no philosophical commitment to those pieces being real things (eg, for market segmentation). You can prove things about what k-means estimates without assuming mixture models (eg, this and this by David Pollard)
You can fit Gaussian mixture models by maximum likelihood, which is a different estimator and different algorithm than k-means. Or with Bayesian estimators and their corresponding algorithms (see eg)
So: spherical Gaussian mixture models are quite closely connected to k-means clustering in some ways. In other ways they are not just different things but different kinds of things.