# Is there an intuitive characterization of distance correlation?

I’ve been staring at the wikipedia page for distance correlation where it seems to be characterized by how it can be calculated. While I could do the calculations I struggle to get what distance correlation measures and why the the calculations look as they do.

Is there a (or many) more intuitive characterization of distance correlation that could help me understand what it measures?

I realize that asking for intuition is a bit vague, but if I knew what kind of intuition I was asking for I would probably not have asked in the first place. I would also be happy for intuition regarding the case of the distance correlation between two random variables (even though distance correlation is defined between two random vectors).

Let us compare usual covariance and distance covariance. The effective part of both are their numerators. (Denominators are simply averaging.) The numerator of covariance is the summed cross-product (= scalar product) of deviations from one point, the mean: $\Sigma (x_i-\mu^x)(y_i-\mu^y)$ (with superscripted $\mu$ as that centroid). To rewrite the expression in this style: $\Sigma d_{i\mu}^x d_{i\mu}^y$, with $d$ standing for the deviation of point $i$ from the centroid, i.e. its (signed) distance to the centroid. The covariance is defined by the sum of the products of the two distances over all points.
How things are with distance covariance? The numerator is, as you know, $\Sigma d_{ij}^x d_{ij}^y$. Isn’t it very much like what we’ve written above? And what is the difference? Here, distance $d$ is between varying data points, not between a data point and the mean as above. The distance covariance is defined by the sum of the products of the two distances over all pairs of points.
Scalar product (between two entities – in our case, variables $x$ and $y$) based on co-distance from one fixed point is maximized when the data are arranged along one straight line. Scalar product based on co-distance from a var*i*able point is maximized when the data are arranged along a straight line locally, piecewisely; in other words, when the data overall represent chain of any shape, dependency of any shape.