The one used by option “ward.D” (equivalent to the only Ward option
“ward” in R versions <= 3.0.3) does not implement Ward’s (1963)
clustering criterion, whereas option “ward.D2” implements that
criterion (Murtagh and Legendre 2014).
Apparently ward.D does not implement Ward’s criterion properly. Nonetheless it seems to do a good job regarding the clusterings it produces. What does method=”ward.D” implement if it is not Ward’s criterion?
Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?. Journal of Classification, 31(3), 274-295.
The relevant manuscript is here.
The difference between ward.D and ward.D2 is the difference between the two clustering criteria that in the manuscript are called Ward1 and Ward2.
It basically boils down to the fact that the Ward algorithm is directly correctly implemented in just Ward2 (ward.D2), but Ward1 (ward.D) can also be used, if the Euclidean distances (from
dist()) are squared before inputing them to the
hclust() using the ward.D as the method.
For example, SPSS also implements Ward1, but warn the users that distances should be squared to obtain the Ward criterion. In such sense implementation of ward.D is not deprecated, and nonetheless it might be a good idea to retain it for backward compatibility.