I am trying to understand Principal Component Analysis (PCA). I found a webpage on PCA that introduces it and the concept of the percentage of variance. However, I am very confused about what “Percentage of Variance” (POV) means. Here are the questions I have:
- Is there a formal definition or mathematics formula to define “percentage of variance”?
- Do we only use POV for calculating PCA? Is POV used somewhere else?
- Is POV the same as “Proportion of Variance Explained”? (There are many similar terms online which really confuse me.)
I assume they are referring to the eigenvalues of the corresponding eigenvectors. The eigenvalues in PCA tell you how much variance can be explained by its associated eigenvector. Therefore, the highest eigenvalue indicates the highest variance in the data was observed in the direction of its eigenvector. Accordingly, if you take all eigenvectors together, you can explain all the variance in the data sample. Instead of using the absolute value of variance explained, as indicated by the eigenvalue, you can also get relative numbers by first summing up all eigenvalues and then divide an eigenvalue λi by this sum
This way you end up with a “percentage of variance” for each eigenvector.