I have a classic linear model, with 5 possible regressors. They are uncorrelated with one another, and have quite low correlation with the response. I have arrived at a model where 3 of the regressors have significant coefficients for their t statistic (p<0.05). Adding either or both of the remaining 2 variables gives p values >0.05 for the t statistic, for the added variables. This leads me to believe the 3 variable model is “best”.

However, using the anova(a,b) command in R where a is the 3 variable model and b is the full model, the p value for the F statistic is < 0.05, which tells me to prefer the full model over the 3 variable model. How can I reconcile these apparent contradictions ?

Thanks

PS

Edit: Some further background. This is homework so I won’t post details, but we are not given details of what the regressors represent – they are just numbered 1 to 5. We are asked to “derive an appropriate model, giving justification”.

**Answer**

The problem began when you sought a reduced model and used the data rather than subject matter knowledge to pick the predictors. Stepwise variable selection without simultaneous shinkage to penalize for variable selection, though often used, is an invalid approach. Much has been written about this. There is no reason to trust that the 3-variable model is “best” and there is no reason not to use the original list of pre-specified predictors. P-values computed after using P-values to select variables is not valid. This has been called “double dipping” in the functional imaging literature.

Here is an analogy. Suppose one is interested in comparing 6 treatments, but uses pairwise t-tests to pick which treatments are “different”, resulting in a reduced set of 4 treatments. The analyst then tests for an overall difference with 3 degrees of freedom. This F test will have inflated type I error. The original F test with 5 d.f. is quite valid.

See http://www.stata.com/support/faqs/stat/stepwise.html and stepwise-regression for more information.

**Attribution***Source : Link , Question Author : LeelaSella , Answer Author : Frank Harrell*