It is well established, at least among statisticians of some higher calibre, that models with the values of the AIC statistic within a certain threshold of the minimum value should be considered as appropriate as the model minimizing the AIC statistic. For example, in [1, p.221] we find
Then models with small GCV or AIC would be considered best. Of course one should not just blindly minimize GCV or AIC. Rather, all models with reasonably small GCV or AIC values should be considered as potentially appropriate and evaluated according to their simplicity and scientific relevance.
Similarly, in [2, p.144] we have
It has been suggested (Duong, 1984) that models with AIC values within c of the minimum value should be considered competitive (with c=2 as a typical value). Selection from among the competitive models can then be based on such factors as whiteness of the residuals (Section 5.3) and model simplicity.
- Ruppert, D.; Wand, M. P. & Carrol, R. J. Semiparametric Regression, Cambridge University Press, 2003
- Brockwell, P. J. & Davis, R. A. Introduction to time-series and forecasting, John Wiley & Sons, 1996
So given the above, which of the two models below should be preferred?
print( lh300 <- arima(lh, order=c(3,0,0)) ) # ... sigma^2 estimated as 0.1787: log likelihood = -27.09, aic = 64.18 print( lh100 <- arima(lh, order=c(1,0,0)) ) # ... sigma^2 estimated as 0.1975: log likelihood = -29.38, aic = 64.76
More generally, when is it appropriate to select models by blindly minimizing the AIC or related statistic?
Paraphrasing from Cosma Shalizi’s lecture notes on the truth about Linear
Regression, thou shall never choose a model just because it happened to minimise a statistic like AIC, for
Every time someone solely uses an AIC statistic for model selection, an angel loses its
wings. Every time someone thoughtlessly minimises it, an angel not only loses its wings,
but is cast out of Heaven and falls in most extreme agony into the everlasting fire.