Low variance components in PCA, are they really just noise? Is there any way to test for it?

I’m trying to decide if a component of a PCA shall be retained, or not. There are a gazillion of criteria based on the magnitude of the eigenvalue, described and compared e.g. here or here.

However, in my application I know that the small(est) eigenvalue will be small compared to the large(st) eigenvalue and the criteria based on magnitude would all reject the small(est) one. This is not what I want. What I am interested in: is there any method known that takes the actual corresponding component of the small eigenvalue into account, in the sense: is it really “just” noise as implied in all the textbooks, or is there “something” of potential interest left? If it is really noise, remove it, otherwise keep it, regardless of the magnitude of the eigenvalue.

Is there some kind of established randomness or distribution test for components in PCA that I am unable to find? Or does anyone know of a reason that this would be a silly idea?

Update

Histograms (green) and normal approximations (blue) of components in two use cases: once probably really noise, once probably not “just” noise (yes, the values are small, but probably not random). The largest singular value is ~160 in both cases, the smallest, i.e. this singular value, is 0.0xx – way too small for any of the cut-off methods.

What I’m looking for is a way to formalize this …

probably really "just" noise
probably not noise but may contain interesting bits

Answer

One way of testing the randomness of a small principal component (PC) is to treat it like a signal instead of noise: i.e., try to predict another variable of interest with it. This is essentially principal components regression (PCR).

In the predictive context of PCR, Lott (1973) recommends selecting PCs in a way that maximizes $R^2$; Gunst and Mason (1977) focus on $MSE$. PCs with small eigenvalues (even the smallest!) can improve predictions (Hotelling, 1957; Massy, 1965; Hawkins, 1973; Hadi & Ling, 1998; Jackson, 1991), and have proven very interesting in some published, predictive applications (Jolliffe, 1982, 2010). These include:

  • A chemical engineering model using PCs 1, 3, 4, 6, 7, and 8 of 9 total (Smith & Campbell, 1980)
  • A monsoon model using PCs 8, 2, and 10 (in order of importance) out of 10 (Kung & Sharif, 1980)
  • An economic model using PCs 4 and 5 out of 6 (Hill, Fomby, & Johnson, 1977)

The PCs in the examples listed above are numbered according to their eigenvalues’ ranked sizes. Jolliffe (1982) describes a cloud model in which the last component contributes most. He concludes:

The above examples have shown that it is not necessary to find obscure or bizarre data in order for the last few principal components to be important in principal component regression. Rather it seems that such examples may be rather common in practice. Hill et al. (1977) give a thorough and useful discussion of strategies for selecting principal components which should have buried forever the idea of selection based solely on size of variance. Unfortunately this does not seem to have happened, and the idea is perhaps more widespread now than 20 years ago.

Furthermore, excluding small-eigenvalue PCs can introduce bias (Mason & Gunst, 1985). Hadi and Ling (1998) recommend considering regression $SS$ as well; they summarize their article thus:

The basic conclusion of this article is that, in general, the PCs may fail to account for the regression fit. As stated in Theorem 1, it is theoretically possible that the first $(p-1)$ PCs, which can have almost 100% of the variance, contribute nothing to the fit, while the response variable $\text{Y}$ may fit perfectly the last PC which is always ignored by the PCR methodology.

The reason for the failure of the PCR in accounting for the variation of the response variable is that the PCs are chosen based on the PCD [principal components decomposition] which depends only on $\text{X}$. Thus, if PCR is to be used, it should be used with caution and the selection of the PCs to keep should be guided not only by the variance decomposition but also by the contribution of each principal component to the regression sum of squares.

I owe this answer to @Scortchi, who corrected my own misconceptions about PC selection in PCR with some very helpful comments, including: “Jolliffe (2010) reviews other ways of selecting PCs.” This reference may be a good place to look for further ideas.

References


– Gunst, R. F., & Mason, R. L. (1977). Biased estimation in regression: an evaluation using mean squared error. Journal of the American Statistical Association, 72(359), 616–628.
– Hadi, A. S., & Ling, R. F. (1998). Some cautionary notes on the use of principal components regression. The American Statistician, 52(1), 15–19. Retrieved from http://www.uvm.edu/~rsingle/stat380/F04/possible/Hadi+Ling-AmStat-1998_PCRegression.pdf.
– Hawkins, D. M. (1973). On the investigation of alternative regressions by principal component analysis. Applied Statistics, 22(3), 275–286.
– Hill, R. C., Fomby, T. B., & Johnson, S. R. (1977). Component selection norms for principal components regression. Communications in Statistics – Theory and Methods, 6(4), 309–334.
– Hotelling, H. (1957). The relations of the newer multivariate statistical methods to factor analysis. British Journal of Statistical Psychology, 10(2), 69–79.
– Jackson, E. (1991). A user’s guide to principal components. New York: Wiley.
– Jolliffe, I. T. (1982). Note on the use of principal components in regression. Applied Statistics, 31(3), 300–303. Retrieved from http://automatica.dei.unipd.it/public/Schenato/PSC/2010_2011/gruppo4-Building_termo_identification/IdentificazioneTermodinamica20072008/Biblio/Articoli/PCR%20vecchio%2082.pdf.
– Jolliffe, I. T. (2010). Principal components analysis (2nd ed.). Springer.
– Kung, E. C., & Sharif, T. A. (1980). Regression forecasting of the onset of the Indian summer monsoon with antecedent upper air conditions. Journal of Applied Meteorology, 19(4), 370–380. Retrieved from http://iri.columbia.edu/~ousmane/print/Onset/ErnestSharif80_JAS.pdf.
– Lott, W. F. (1973). The optimal set of principal component restrictions on a least-squares regression. Communications in Statistics – Theory and Methods, 2(5), 449–464.
– Mason, R. L., & Gunst, R. F. (1985). Selecting principal components in regression. Statistics & Probability Letters, 3(6), 299–301.
– Massy, W. F. (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60(309), 234–256. Retrieved from http://automatica.dei.unipd.it/public/Schenato/PSC/2010_2011/gruppo4-Building_termo_identification/IdentificazioneTermodinamica20072008/Biblio/Articoli/PCR%20vecchio%2065.pdf.
– Smith, G., & Campbell, F. (1980). A critique of some ridge regression methods. Journal of the American Statistical Association, 75(369), 74–81. Retrieved from https://cowles.econ.yale.edu/P/cp/p04b/p0496.pdf.

Attribution
Source : Link , Question Author : Daniel , Answer Author : Community

Leave a Comment