How to choose between the different Adjusted R2R^2 formulas?

I have in mind the adjusted R-squared formulas proposed by:

  • Ezekiel (1930), which I believe is the one currently used in SPSS.

    R2adjusted=1(N1)(Np1)(1R2)

  • Olkin and Pratt (1958)

    R2unbiased=1(N3)(1R2)(Np1)2(N3)(1R2)2(Np1)(Np+1)

Under what circumstances (if any) should I prefer ‘adjusted’ to ‘unbiased’ R2?

References

  1. Ezekiel, M. (1930). Methods of correlation analysis. John Wiley and Sons, New York.
  2. Olkin I., Pratt J. W. (1958). Unbiased Estimation of Certain Correlation Coefficients. Annals of Mathematical Statistics, 29(1), 201-211.

Answer

Without wanting to take credit for @ttnphns’ answer, I wanted to move the answer out of the comments (particularly considering that the link to the article had died). Matt Krause’s answer provides a useful discussion of the distinction between R2 and R2adj but it does not discuss the decision of which R2adj formula to use in any given case.

As I discuss in this answer, Yin and Fan (2001) provide a good overview of the many different formulas for estimating population variance explained ρ2, all of which could potentially be labelled a type of adjusted R2.

They perform simulation to assess which of a wide range of adjusted r-square formulas provide the best unbiased estimate for different sample sizes, ρ2, and predictor intercorrelations. They suggest that the Pratt formula may be a good option, but I don’t think the study was definitive on the matter.

Update: Raju et al (1997) note that adjusted R2 formulas differ based on whether they are designed to estimate adjusted R2 assuming fixed-x or random-x predcitors. Specifically, the Ezekial formula is designed to estimate ρ2 in the fixed-x context, and the Olkin-Pratt and Pratt formulas are designed to estimate ρ2 in the random-x context. There’s not much difference between the Olkin-Pratt and Pratt formulas. Fixed-x assumptions align with planned experiments, random-x assumptions align with when you assume that the values of the predictor variables are a sample of possible values as is typically the case in observational studies. See this answer for further discussion . There’s also not much difference between the two types of formulas as sample sizes gets moderately large (see here for a discussion of the size of the difference).

Summary of Rules of Thumb

  • If you assume that your observations for predictor variables are a random sample from a population, and you want to estimate ρ2 for the full population of both predictors and criterion (i.e., random-x assumption) then use the Olkin-Pratt formula (or the Pratt formula).
  • If you assume that your observations are fixed or you don’t want to generalise beyond your observed levels of the predictor, then estimate ρ2 with the Ezekiel formula.
  • If you are want to know about out of sample prediction using the sample regression equation, then you would want to look into some form of cross-validation procedure.

References

  • Raju, N. S., Bilgic, R., Edwards, J. E., & Fleer, P. F. (1997). Methodology review: Estimation of population validity and cross-validity, and the use of equal weights in prediction. Applied Psychological Measurement, 21(4), 291-305.
  • Yin, P., & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of different analytical methods. The Journal of Experimental Education, 69(2), 203-224. PDF

Attribution
Source : Link , Question Author : user1205901 – Слава Україні , Answer Author : Community

Leave a Comment