I am using hierarchical clustering to analyze time series data. My code is implemented using the Mathematica function
DirectAgglomerate[...], which generates hierarchical clusters given the following inputs:
a distance matrix D
the name of the method used to determine inter-cluster linkage.
I have calculated the distance matrix D using Manhattan distance:
d(x,y) = \sum_i|x_i – y_i|
where i = 1,\cdots, n and n \approx 150 is the number of data points in my time series.
My question is, is it ok to use Ward’s inter-cluster linkage with a Manhattan distance matrix? Some sources suggest that Ward’s linkage should only be used with Euclidean distance.
DirectAgglomerate[...]calculates Ward’s linkage using the distance matrix only, not the original observations. Unfortunately, I am unsure how Mathematica modifies Ward’s original algorithm, which (from my understanding) worked by minimizing the error sum of squares of the observations, calculated with respect to the cluster mean. For example, for a cluster c consisting of a vector of univariate observations, Ward formulated the error sum of squares as:
(\sum_j||c_j – mean(c)||_2)^2
(Other software tools such as Matlab and R also implement Ward’s clustering using just a distance matrix so the question isn’t specific to Mathematica.)
The Ward clustering algorithm is a hierarchical clustering method that minimizes an ‘inertia’ criteria at each step. This inertia quantifies the sum of squared residuals between the reduced signal and the initial signal: it is a measure of the variance of the error in an l2 (Euclidean) sens. Actually, you even mention it in your question. This is why, I believe, it makes no sens to apply it to a distance matrix that is not a l2 Euclidean distance.
On the other hand, an average linkage or a single linkage hierarchical clustering would be perfectly suitable for other distances.