# Clusterings that can be caused by K-means

I have gotten the following question as a test question for my exam and I simply cannot understand the answer.

A scatter plot of the data projected onto the first two principal components
is shown below. We wish to examine if there exists some group structure in the data set. To do this, we have run the k-means algorithm with k = 2 using the Euclidean distance measure. The result of the k-means algorithm can vary between runs depending on the random initial conditions. We ran the algorithm several times and got some different clustering results.

Only three of the four clusterings shown can be obtained by running the k-means algorithm on the data. Which one can not be obtained by k-means?
(there’s nothing special about the data) The correct answer is D.
Can any of you explain why?

To put some more meat on Peter Flom’s answer, k-means clustering looks for k groups in data. The method assumes that Each cluster has a centroid at a certain `(x,y)`. The k-means algorithm minimizes the distance of each point to the centroid (this could be euclidian or manhattan distance depending on your data).
So, in your case it looks like case A had initial conditions that were widely separated in `x` and so the clusters resolve because the distances from the centroids to the data are small, and it’s a stable solution. Conversely, you can’t obtain D because that single red point is closer to the centroid of the blue points than many others, so the red point should have become part of the blue set.