I am using hierarchical clustering to analyze time series data. My code is implemented using the

Mathematicafunction`DirectAgglomerate[...]`

, which generates hierarchical clusters given the following inputs:

a distance matrix D

the name of the method used to determine inter-cluster linkage.

I have calculated the distance matrix D using Manhattan distance:

d(x,y) = \sum_i|x_i – y_i|

where i = 1,\cdots, n and n \approx 150 is the number of data points in my time series.

My question is, is it ok to use Ward’s inter-cluster linkage with a Manhattan distance matrix? Some sources suggest that Ward’s linkage should only be used with Euclidean distance.

Note that

`DirectAgglomerate[...]`

calculates Ward’s linkage using the distance matrix only, not the original observations. Unfortunately, I am unsure howMathematicamodifies Ward’s original algorithm, which (from my understanding) worked by minimizing the error sum of squares of the observations, calculated with respect to the cluster mean. For example, for a cluster c consisting of a vector of univariate observations, Ward formulated the error sum of squares as:(\sum_j||c_j – mean(c)||_2)^2

(Other software tools such as Matlab and Ralsoimplement Ward’s clustering using just a distance matrix so the question isn’t specific to Mathematica.)

**Answer**

The Ward clustering algorithm is a hierarchical clustering method that minimizes an ‘inertia’ criteria at each step. This inertia quantifies the sum of squared residuals between the reduced signal and the initial signal: it is a measure of the variance of the error in an l2 (Euclidean) sens. Actually, you even mention it in your question. This is why, I believe, it makes no sens to apply it to a distance matrix that is not a l2 Euclidean distance.

On the other hand, an average linkage or a single linkage hierarchical clustering would be perfectly suitable for other distances.

**Attribution***Source : Link , Question Author : Rachel , Answer Author :
Gael Varoquaux
*