# Why does r2r^2 between two variables represent proportion of shared variance?

Firstly, I appreciate that discussions about $r^2$ generally provoke explanations about $R^2$ (i.e., the coefficient of determination in regression). The problem I’m seeking to answer is generalizing that to all instances of correlation between two variables.

So, I’ve been puzzled about shared variance for quite a while. I’ve had a few explanations offered but they all seem problematic:

1. It’s just another term for covariance. This can’t be the case, as factor analysis literature differentiates between PCA and EFA by stating that the latter accounts for shared variance and the former does not (PCA obviously is accounting for covariance in that it is operating over a covariance matrix, so shared variance must be a distinct concept).

2. It is the correlation coefficient squared ($r^2$). See:

This makes slightly more sense. The trouble here is interpreting how that implies it is shared variance. For example, one interpretation of ‘sharing variance’ is ${\rm cov}(A,B)/[{\rm var}(A)+{\rm var}(B)]$. $r^2$ doesn’t reduce to that, or indeed a readily intuitive concept [${\rm cov}(A,B)^2/({\rm var}(A)\times{\rm var}(B))$; which is a 4 dimensional object].

The links above both attempt to explain it via a Ballentine diagram. They don’t help. Firstly, the circles are equally sized (which seems to be important to the illustration for some reason), which doesn’t account for unequal variances. One could assume it is the Ballentine diagrams for the standardized variables, hence equal variance, in which case the overlapping segment would account for the covariance between two standardized variables (the correlation). So $r$, not $r^2$.

TL;DR: Explanations of shared variance say this:

By squaring the coefficient, you know how much variance, in percentage terms, the two variables share.

Why would that be the case?

One can only guess what one particular author might mean by “shared variance.” We might hope to circumscribe the possibilities by considering what properties this concept ought (intuitively) to have. We know that “variances add”: the variance of a sum $X+\varepsilon$ is the sum of the variances of $X$ and $\varepsilon$ when $X$ and $\varepsilon$ have zero covariance. It is natural to define the “shared variance” of $X$ with the sum to be the fraction of the variance of the sum represented by the variance of $X$. This is enough to imply the shared variances of any two random variables $X$ and $Y$ must be the square of their correlation coefficient.

This result gives meaning to the interpretation of a squared correlation coefficient as a “shared variance”: in a suitable sense, it really is a fraction of a total variance that can be assigned to one variable in the sum.

The details follow.

### Principles and their implications

Of course if $Y=X$, their “shared variance” (let’s call it “SV” from now on) ought to be 100%. But what if $Y$ and $X$ are just scaled or shifted versions of one another? For instance, what if $Y$ represents the temperature of a city in degrees F and $X$ represents the temperature in degrees C? I would like to suggest that in such cases $X$ and $Y$ should still have 100% SV, so that this concept will remain meaningful regardless of how $X$ and $Y$ might be measured:

for any numbers $\alpha, \gamma$ and nonzero numbers $\beta, \delta$.

Another principle might be that when $\varepsilon$ is a random variable independent of $X$, then the variance of $X+\varepsilon$ can be uniquely decomposed into two non-negative parts,

suggesting we attempt to define SV in this special case as

Since all these criteria are only up to second order–they only involve the first and second moments of the variables in the forms of expectations and variances–let’s relax the requirement that $X$ and $\varepsilon$ be independent and only demand that they be uncorrelated. This will make the analysis much more general than it otherwise might be.

### The results

These principles–if you accept them–lead to a unique, familiar, interpretable concept. The trick will be to reduce the general case to the special case of a sum, where we can apply definition $(2)$.

Given $(X,Y)$, we simply attempt to decompose $Y$ into a scaled, shifted version of $X$ plus a variable that is uncorrelated with $X$: that is, let’s find (if it’s possible) constants $\alpha$ and $\beta$ and a random variable $\epsilon$ for which

with $\operatorname{Cov}(X, \varepsilon)=0$. For the decomposition to have any chance of being unique, we should demand

so that once $\beta$ is found, $\alpha$ is determined by

This looks an awful lot like linear regression and indeed it is. The first principle says we may rescale $X$ and $Y$ to have unit variance (assuming they each have nonzero variance) and that when it is done, standard regression results assert the value of $\beta$ in $(3)$ is the correlation of $X$ and $Y$:

Moreover, taking the variances of $(1)$ gives

implying

Consequently

Note that because the regression coefficient on $Y$ (when standardized to unit variance) is $\rho(Y,X)=\rho(X,Y)$, the “shared variance” itself is symmetric, justifying a terminology that suggests the order of $X$ and $Y$ does not matter: