# How to choose between the different Adjusted R2R^2 formulas?

I have in mind the adjusted R-squared formulas proposed by:

• Ezekiel (1930), which I believe is the one currently used in SPSS.

$$R2adjusted=1−(N−1)(N−p−1)(1−R2)R^2_{\rm adjusted} = 1 - \frac{(N-1)}{(N-p-1)} (1-R^2)$$

• Olkin and Pratt (1958)

$$R2unbiased=1−(N−3)(1−R2)(N−p−1)−2(N−3)(1−R2)2(N−p−1)(N−p+1)R^2_{\rm unbiased} = 1 - \frac{(N-3)(1-R^2)}{(N-p-1)} - \frac{2(N-3)(1-R^2)^2}{(N-p-1)(N-p+1)}$$

Under what circumstances (if any) should I prefer ‘adjusted’ to ‘unbiased’ $$R2R^2$$?

References

1. Ezekiel, M. (1930). Methods of correlation analysis. John Wiley and Sons, New York.
2. Olkin I., Pratt J. W. (1958). Unbiased Estimation of Certain Correlation Coefficients. Annals of Mathematical Statistics, 29(1), 201-211.

Without wanting to take credit for @ttnphns’ answer, I wanted to move the answer out of the comments (particularly considering that the link to the article had died). Matt Krause’s answer provides a useful discussion of the distinction between $R^2$ and $R^2_{adj}$ but it does not discuss the decision of which $R^2_{adj}$ formula to use in any given case.

As I discuss in this answer, Yin and Fan (2001) provide a good overview of the many different formulas for estimating population variance explained $\rho^2$, all of which could potentially be labelled a type of adjusted $R^2$.

They perform simulation to assess which of a wide range of adjusted r-square formulas provide the best unbiased estimate for different sample sizes, $\rho^2$, and predictor intercorrelations. They suggest that the Pratt formula may be a good option, but I don’t think the study was definitive on the matter.

Update: Raju et al (1997) note that adjusted $R^2$ formulas differ based on whether they are designed to estimate adjusted $R^2$ assuming fixed-x or random-x predcitors. Specifically, the Ezekial formula is designed to estimate $\rho^2$ in the fixed-x context, and the Olkin-Pratt and Pratt formulas are designed to estimate $\rho^2$ in the random-x context. There’s not much difference between the Olkin-Pratt and Pratt formulas. Fixed-x assumptions align with planned experiments, random-x assumptions align with when you assume that the values of the predictor variables are a sample of possible values as is typically the case in observational studies. See this answer for further discussion . There’s also not much difference between the two types of formulas as sample sizes gets moderately large (see here for a discussion of the size of the difference).

### Summary of Rules of Thumb

• If you assume that your observations for predictor variables are a random sample from a population, and you want to estimate $\rho^2$ for the full population of both predictors and criterion (i.e., random-x assumption) then use the Olkin-Pratt formula (or the Pratt formula).
• If you assume that your observations are fixed or you don’t want to generalise beyond your observed levels of the predictor, then estimate $\rho^2$ with the Ezekiel formula.
• If you are want to know about out of sample prediction using the sample regression equation, then you would want to look into some form of cross-validation procedure.

### References

• Raju, N. S., Bilgic, R., Edwards, J. E., & Fleer, P. F. (1997). Methodology review: Estimation of population validity and cross-validity, and the use of equal weights in prediction. Applied Psychological Measurement, 21(4), 291-305.
• Yin, P., & Fan, X. (2001). Estimating $R^2$ shrinkage in multiple regression: A comparison of different analytical methods. The Journal of Experimental Education, 69(2), 203-224. PDF