# Cosine Distance as Similarity Measure in KMeans [duplicate]

I am currently solving a problem where I have to use Cosine distance as the similarity measure for k-means clustering. However, the standard k-means clustering package (from Sklearn package) uses Euclidean distance as standard, and does not allow you to change this.

Therefore it is my understanding that by normalising my original dataset through the code below. I can then run kmeans package (using Euclidean distance); will it be the same as if I had changed the distance metric to Cosine distance?

from sklearn import preprocessing  # to normalise existing X
X_Norm = preprocessing.normalize(X)

km2 = cluster.KMeans(n_clusters=5,init='random').fit(X_Norm)


Please let me know if my mathematical understanding of this is incorrect.

Cosine distance is actually cosine similarity: $\cos(x,y) = \frac{\sum x_iy_i}{\sqrt{\sum x_i^2 \sum y_i^2 }}$.
Now, let’s see what we can do with euclidean distance for normalized vectors $(\sum x_i^2 =\sum y_i^2 =1)$:
Note that for normalized vectors $\cos(x,y) = \frac{\sum x_iy_i}{\sqrt{\sum x_i^2 \sum y_i^2 }} =\sum x_iy_i$