I’m experimenting with R and found that an anova() needs an object of type lm. But why should I continue with an anova after this:
> x <- data.frame(rand=rnorm(100), factor=sample(c("A","B","C"),100,replace=TRUE)) > head(x) rand factor 1 0.9640502 B 2 -0.5038238 C 3 -1.5699734 A 4 -0.8422324 B 5 0.2489113 B 6 -1.4685439 A > model <- lm(x$rand ~ x$factor)) > summary(model) Call: lm(formula = x$rand ~ x$factor) Residuals: Min 1Q Median 3Q Max -2.74118 -0.89259 0.02904 0.59726 3.19762 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1878 0.1845 -1.018 0.311 x$factorB -0.1284 0.2689 -0.477 0.634 x$factorC 0.4246 0.2689 1.579 0.118 Residual standard error: 1.107 on 97 degrees of freedom Multiple R-squared: 0.04345, Adjusted R-squared: 0.02372 F-statistic: 2.203 on 2 and 97 DF, p-value: 0.1160
This tells me everything I need, or does it not? I’m curious why you want to continue with an anova(model)
Let’s look at what you get when you actually use the anova() function (the numbers are different than in your example, since I don’t know what seed you used for generating the random numbers, but the point remains the same):
> anova(model) Analysis of Variance Table Response: x$rand Df Sum Sq Mean Sq F value Pr(>F) x$factor 2 4.142 2.0708 1.8948 0.1559 Residuals 97 106.009 1.0929
The F-test for the factor is testing simultaneously H0:β1=β2=0, i.e., the hypothesis that the factor in general is not significant. A common strategy is to first test this omnibus hypothesis before digging down which of the levels of the factor are different from each other.
Also, you can use the anova() function for full versus reduced model tests. For example:
> x <- data.frame(rand=rnorm(100), factor=sample(c("A","B","C"),100,replace=TRUE), y1=rnorm(100), y2=rnorm(100)) > model1 <- lm(x$rand ~ x$factor + x$y1 + x$y2) > model2 <- lm(x$rand ~ x$factor) > anova(model2, model1) Analysis of Variance Table Model 1: x$rand ~ x$factor Model 2: x$rand ~ x$factor + x$y1 + x$y2 Res.Df RSS Df Sum of Sq F Pr(>F) 1 97 105.06 2 95 104.92 2 0.13651 0.0618 0.9401
which is a comparison of the full model with the factor and two covariates (y1 and y2) and the reduced model, where we assume that the slopes of the two covariates are both simultaneously equal to zero.