Classic linear model – model selection

I have a classic linear model, with 5 possible regressors. They are uncorrelated with one another, and have quite low correlation with the response. I have arrived at a model where 3 of the regressors have significant coefficients for their t statistic (p<0.05). Adding either or both of the remaining 2 variables gives p values >0.05 for the t statistic, for the added variables. This leads me to believe the 3 variable model is “best”.

However, using the anova(a,b) command in R where a is the 3 variable model and b is the full model, the p value for the F statistic is < 0.05, which tells me to prefer the full model over the 3 variable model. How can I reconcile these apparent contradictions ?

Thanks
PS
Edit: Some further background. This is homework so I won’t post details, but we are not given details of what the regressors represent – they are just numbered 1 to 5. We are asked to “derive an appropriate model, giving justification”.

Answer

The problem began when you sought a reduced model and used the data rather than subject matter knowledge to pick the predictors. Stepwise variable selection without simultaneous shinkage to penalize for variable selection, though often used, is an invalid approach. Much has been written about this. There is no reason to trust that the 3-variable model is “best” and there is no reason not to use the original list of pre-specified predictors. P-values computed after using P-values to select variables is not valid. This has been called “double dipping” in the functional imaging literature.

Here is an analogy. Suppose one is interested in comparing 6 treatments, but uses pairwise t-tests to pick which treatments are “different”, resulting in a reduced set of 4 treatments. The analyst then tests for an overall difference with 3 degrees of freedom. This F test will have inflated type I error. The original F test with 5 d.f. is quite valid.

See http://www.stata.com/support/faqs/stat/stepwise.html and for more information.

Attribution
Source : Link , Question Author : LeelaSella , Answer Author : Frank Harrell

Leave a Comment