# What are the ANOVA’s benefits over a normal linear model?

I’m experimenting with R and found that an anova() needs an object of type lm. But why should I continue with an anova after this:

> x <- data.frame(rand=rnorm(100), factor=sample(c("A","B","C"),100,replace=TRUE))
rand factor
1  0.9640502      B
2 -0.5038238      C
3 -1.5699734      A
4 -0.8422324      B
5  0.2489113      B
6 -1.4685439      A

> model <- lm(x$rand ~ x$factor))
> summary(model)

Call:
lm(formula = x$rand ~ x$factor)

Residuals:
Min       1Q   Median       3Q      Max
-2.74118 -0.89259  0.02904  0.59726  3.19762

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.1878     0.1845  -1.018    0.311
x$factorB -0.1284 0.2689 -0.477 0.634 x$factorC     0.4246     0.2689   1.579    0.118

Residual standard error: 1.107 on 97 degrees of freedom
Multiple R-squared: 0.04345, Adjusted R-squared: 0.02372
F-statistic: 2.203 on 2 and 97 DF,  p-value: 0.1160


This tells me everything I need, or does it not? I’m curious why you want to continue with an anova(model)

Let’s look at what you get when you actually use the anova() function (the numbers are different than in your example, since I don’t know what seed you used for generating the random numbers, but the point remains the same):

> anova(model)

Analysis of Variance Table

Response: x$rand Df Sum Sq Mean Sq F value Pr(>F) x$factor   2   4.142  2.0708  1.8948 0.1559
Residuals 97 106.009  1.0929


The F-test for the factor is testing simultaneously $H_0: \beta_1 = \beta_2 = 0$, i.e., the hypothesis that the factor in general is not significant. A common strategy is to first test this omnibus hypothesis before digging down which of the levels of the factor are different from each other.

Also, you can use the anova() function for full versus reduced model tests. For example:

> x <- data.frame(rand=rnorm(100), factor=sample(c("A","B","C"),100,replace=TRUE),
y1=rnorm(100), y2=rnorm(100))
> model1 <- lm(x$rand ~ x$factor + x$y1 + x$y2)
> model2 <- lm(x$rand ~ x$factor)
> anova(model2, model1)

Analysis of Variance Table

Model 1: x$rand ~ x$factor
Model 2: x$rand ~ x$factor + x$y1 + x$y2
Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     97 105.06
2     95 104.92  2   0.13651 0.0618 0.9401


which is a comparison of the full model with the factor and two covariates (y1 and y2) and the reduced model, where we assume that the slopes of the two covariates are both simultaneously equal to zero.