I have in mind the adjusted R-squared formulas proposed by:
Ezekiel (1930), which I believe is the one currently used in SPSS.
Olkin and Pratt (1958)
Under what circumstances (if any) should I prefer ‘adjusted’ to ‘unbiased’ R2?
- Ezekiel, M. (1930). Methods of correlation analysis. John Wiley and Sons, New York.
- Olkin I., Pratt J. W. (1958). Unbiased Estimation of Certain Correlation Coefficients. Annals of Mathematical Statistics, 29(1), 201-211.
Without wanting to take credit for @ttnphns’ answer, I wanted to move the answer out of the comments (particularly considering that the link to the article had died). Matt Krause’s answer provides a useful discussion of the distinction between R2 and R2adj but it does not discuss the decision of which R2adj formula to use in any given case.
As I discuss in this answer, Yin and Fan (2001) provide a good overview of the many different formulas for estimating population variance explained ρ2, all of which could potentially be labelled a type of adjusted R2.
They perform simulation to assess which of a wide range of adjusted r-square formulas provide the best unbiased estimate for different sample sizes, ρ2, and predictor intercorrelations. They suggest that the Pratt formula may be a good option, but I don’t think the study was definitive on the matter.
Update: Raju et al (1997) note that adjusted R2 formulas differ based on whether they are designed to estimate adjusted R2 assuming fixed-x or random-x predcitors. Specifically, the Ezekial formula is designed to estimate ρ2 in the fixed-x context, and the Olkin-Pratt and Pratt formulas are designed to estimate ρ2 in the random-x context. There’s not much difference between the Olkin-Pratt and Pratt formulas. Fixed-x assumptions align with planned experiments, random-x assumptions align with when you assume that the values of the predictor variables are a sample of possible values as is typically the case in observational studies. See this answer for further discussion . There’s also not much difference between the two types of formulas as sample sizes gets moderately large (see here for a discussion of the size of the difference).
Summary of Rules of Thumb
- If you assume that your observations for predictor variables are a random sample from a population, and you want to estimate ρ2 for the full population of both predictors and criterion (i.e., random-x assumption) then use the Olkin-Pratt formula (or the Pratt formula).
- If you assume that your observations are fixed or you don’t want to generalise beyond your observed levels of the predictor, then estimate ρ2 with the Ezekiel formula.
- If you are want to know about out of sample prediction using the sample regression equation, then you would want to look into some form of cross-validation procedure.
- Raju, N. S., Bilgic, R., Edwards, J. E., & Fleer, P. F. (1997). Methodology review: Estimation of population validity and cross-validity, and the use of equal weights in prediction. Applied Psychological Measurement, 21(4), 291-305.
- Yin, P., & Fan, X. (2001). Estimating R2 shrinkage in multiple regression: A comparison of different analytical methods. The Journal of Experimental Education, 69(2), 203-224. PDF