# Principal component analysis “backwards”: how much variance of the data is explained by a given linear combination of the variables?

I have carried out a principal components analysis of six variables $A$, $B$, $C$, $D$, $E$ and $F$. If I understand correctly, unrotated PC1 tells me what linear combination of these variables describes/explains the most variance in the data and PC2 tells me what linear combination of these variables describes the next most variance in the data and so on.

I’m just curious — is there any way of doing this “backwards”? Let’s say I choose some linear combination of these variables — e.g. $A+2B+5C$, could I work out how much variance in the data this describes?

If we start with the premise that all variables have been centred (standard practice in PCA), then the total variance in the data is just the sum of squares:

This is equal to the trace of the covariance matrix of the variables, which equals the sum of the eigenvalues of the covariance matrix. This is the same quantity that PCA speaks of in terms of “explaining the data” – i.e. you want your PCs to explain the greatest proportion of the diagonal elements of the covariance matrix. Now if we make this an objective function for a set of predicted values like so:

Then the first principal component minimises $S$ among all rank 1 fitted values $(\hat{A}_{i},\dots,\hat{F}_{i})$. So it would seem like the appropriate quantity you are after is

To use your example $A+2B+5C$, we need to turn this equation into rank 1 predictions. First you need to normalise the weights to have sum of squares 1. So we replace $(1,2,5,0,0,0)$ (sum of squares $30$) with $\left(\frac{1}{\sqrt{30}},\frac{2}{\sqrt{30}},\frac{5}{\sqrt{30}},0,0,0\right)$. Next we “score” each observation according to the normalised weights:

Then we multiply the scores by the weight vector to get our rank 1 prediction.

Then we plug these estimates into $S$ calculate $P$. You can also put this into matrix norm notation, which may suggest a different generalisation. If we set $O$ as the $N\times q$ matrix of observed values of the variables ($q=6$ in your case), and $E$ as a corresponding matrix of predictions. We can define the proportion of variance explained as:

Where $||.||_{2}$ is the Frobenius matrix norm. So you could “generalise” this to be some other kind of matrix norm, and you will get a difference measure of “variation explained”, although it won’t be “variance” per se unless it is sum of squares.