I’ve been staring at the wikipedia page for distance correlation where it seems to be characterized by how it can be calculated. While I could do the calculations I struggle to get what distance correlation measures and why the the calculations look as they do.
Is there a (or many) more intuitive characterization of distance correlation that could help me understand what it measures?
I realize that asking for intuition is a bit vague, but if I knew what kind of intuition I was asking for I would probably not have asked in the first place. I would also be happy for intuition regarding the case of the distance correlation between two random variables (even though distance correlation is defined between two random vectors).
This my answer doesn’t answer the question correctly. Please read the comments.
Let us compare usual covariance and distance covariance. The effective part of both are their numerators. (Denominators are simply averaging.) The numerator of covariance is the summed cross-product (= scalar product) of deviations from one point, the mean: Σ(xi−μx)(yi−μy) (with superscripted μ as that centroid). To rewrite the expression in this style: Σdxiμdyiμ, with d standing for the deviation of point i from the centroid, i.e. its (signed) distance to the centroid. The covariance is defined by the sum of the products of the two distances over all points.
How things are with distance covariance? The numerator is, as you know, Σdxijdyij. Isn’t it very much like what we’ve written above? And what is the difference? Here, distance d is between varying data points, not between a data point and the mean as above. The distance covariance is defined by the sum of the products of the two distances over all pairs of points.
Scalar product (between two entities – in our case, variables x and y) based on co-distance from one fixed point is maximized when the data are arranged along one straight line. Scalar product based on co-distance from a var*i*able point is maximized when the data are arranged along a straight line locally, piecewisely; in other words, when the data overall represent chain of any shape, dependency of any shape.
And indeed, usual covariance is bigger when the relationship is closer to be perfect linear and variances are bigger. If you standardize the variances to a fixed unit, the covariance depends only on the strength of linear association, and it is then called Pearson correlation. And, as we know – and just have got some intuition why – distance covariance is bigger when the relationship is closer to be perfect curve and data spreads are bigger. If you standardize the spreads to a fixed unit, the covariance depends only on the strength of some curvilinear association, and it is then called Brownian (distance) correlation.