Interpreting the drop1 output in R

In R, the drop1command outputs something neat.
These two commands should get you some output:
example(step)#-> swiss
drop1(lm1, test="F")

Mine looks like this:

> drop1(lm1, test="F")
Single term deletions

Model:
Fertility ~ Agriculture + Examination + Education + Catholic + 
    Infant.Mortality
                 Df Sum of Sq    RSS    AIC F value     Pr(F)    
<none>                        2105.0 190.69                      
Agriculture       1    307.72 2412.8 195.10  5.9934  0.018727 *  
Examination       1     53.03 2158.1 189.86  1.0328  0.315462    
Education         1   1162.56 3267.6 209.36 22.6432 2.431e-05 ***
Catholic          1    447.71 2552.8 197.75  8.7200  0.005190 ** 
Infant.Mortality  1    408.75 2513.8 197.03  7.9612  0.007336 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

What does all of this mean? I’m assuming that the “stars” help in deciding which input variables are to be kept.
Looking at the output above, I want to throw away the “Examination” variable and focus on the “Education” variable, is interpretation this correct?

Also, the AIC value, lower is better, yes?

Ed. Please note the Community Wiki answer below and add to it if you see fit, to clarify this output.

Answer

drop1 gives you a comparison of models based on the AIC criterion, and when using the option test="F" you add a “type II ANOVA” to it, as explained in the help files. As long as you only have continuous variables, this table is exactly equivalent to summary(lm1), as the F-values are just those T-values squared. P-values are exactly the same.

So what to do with it? Interprete it in exactly that way: it expresses in a way if the model without that term is “significantly” different from the model with that term. Mind the “” around significantly, as the significance here cannot be interpreted as most people think. (multi-testing problem and all…)

And regarding the AIC : the lower the better seems more like it. AIC is a value that goes for the model, not for the variable. So the best model from that output would be the one without the variable examination.

Mind you, the calculation of both AIC and the F statistic are different from the R functions AIC(lm1) resp. anova(lm1). For AIC(), that information is given on the help pages of extractAIC(). For the anova() function, it’s rather obvious that type I and type II SS are not the same.

I’m trying not to be rude, but if you don’t understand what is explained in the help files there, you shouldn’t be using the function in the first place. Stepwise regression is incredibly tricky, jeopardizing your p-values in a most profound manner. So again, do not base yourself on the p-values. Your model should reflect your hypothesis and not the other way around.

Attribution
Source : Link , Question Author : gakera , Answer Author : Joris Meys

Leave a Comment