I have some questions about the AIC and hope you can help me. I applied model selection (backward, or forward) based on the AIC on my data. And some of the selected variables ended up with a p-values > 0.05. I know that people are saying we should select models based on the AIC instead of the p-value, so seems that the AIC and the p-value are two difference concepts. Could someone tell me what the difference is? What I understand so far is that:

For backward selection using the AIC, suppose we have 3 variables (var1, var2, var3) and the AIC of this model is AIC*. If excluding any one of these three variables would not end up with a AIC which is significantly lower than AIC* (in terms of ch-square distribution with df=1), then we would say these three variables are the final results.

A significant p-value for a variable (e.g. var1) in a three variable model means that the standardized effect size of that variable is significantly different from 0 (according to Wald, or t-test).

What’s the fundamental difference between these two methods? How do I interpret it if there are some variables having non-significant p-values in my best model (obtained via the AIC)?

**Answer**

AIC and its variants are closer to variations on $R^2$ then on p-values of each regressor. More precisely, they are penalized versions of the log-likelihood.

You don’t want to test differences of AIC using chi-squared. You could test differences of the log-likelihood using chi-squared (if the models are nested). For AIC, lower is better (in *most* implementations of it, anyway). No further adjustment needed.

You really want to avoid automated model selection methods, if you possibly can. If you must use one, try LASSO or LAR.

**Attribution***Source : Link , Question Author : tiantianchen , Answer Author : Peter Flom*