Is there any way to determine the optimal cluster number or should I just try different values and check the error rates to decide on the best value?

**Answer**

The method I use is to use CCC (Cubic Clustering Criteria). I look for CCC to increase to a maximum as I increment the number of clusters by 1, and then observe when the CCC starts to decrease. At that point I take the number of clusters at the (local) maximum. This would be similar to using a scree plot to picking the number of principal components.

SAS Technical Report A-108 Cubic Clustering Criterion (pdf)

n = number of observations

nk = number in cluster k

p = number of variables

q = number of clusters

X = n×p data matrix

M = q×p matrix of cluster means

Z = cluster indicator (zik=1 if obs. i in cluster k, 0 otherwise)

Assume each variable has mean 0:

Z′Z=diag(n1,⋯,nq), M=(Z′Z)−1Z′X

SS(total) matrix = T= X′X

SS(between clusters) matrix = B = M′Z′ZM

SS(within clusters) matrix = W = T−B

R^2 = 1 – \frac{\text{trace(W)}}{\text{trace}(T)}

(trace = sum of diagonal elements)

Stack columns of X into one long column.

Regress on Kronecker product of Z with p\times p identity matrix

Compute R^2 for this regression – same R^2

The CCC idea is to compare the R^2 you get for a given set of clusters with the R^2 you would get by clustering a uniformly distributed set of points in p dimensional space.

**Attribution***Source : Link , Question Author : berkay , Answer Author : gung – Reinstate Monica*