I’m experimenting with R and found that an anova() needs an object of type lm. But why should I continue with an anova after this:

`> x <- data.frame(rand=rnorm(100), factor=sample(c("A","B","C"),100,replace=TRUE)) > head(x) rand factor 1 0.9640502 B 2 -0.5038238 C 3 -1.5699734 A 4 -0.8422324 B 5 0.2489113 B 6 -1.4685439 A > model <- lm(x$rand ~ x$factor)) > summary(model) Call: lm(formula = x$rand ~ x$factor) Residuals: Min 1Q Median 3Q Max -2.74118 -0.89259 0.02904 0.59726 3.19762 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1878 0.1845 -1.018 0.311 x$factorB -0.1284 0.2689 -0.477 0.634 x$factorC 0.4246 0.2689 1.579 0.118 Residual standard error: 1.107 on 97 degrees of freedom Multiple R-squared: 0.04345, Adjusted R-squared: 0.02372 F-statistic: 2.203 on 2 and 97 DF, p-value: 0.1160`

This tells me everything I need, or does it not? I’m curious why you want to continue with an anova(model)

**Answer**

Let’s look at what you get when you actually use the anova() function (the numbers are different than in your example, since I don’t know what seed you used for generating the random numbers, but the point remains the same):

```
> anova(model)
Analysis of Variance Table
Response: x$rand
Df Sum Sq Mean Sq F value Pr(>F)
x$factor 2 4.142 2.0708 1.8948 0.1559
Residuals 97 106.009 1.0929
```

The F-test for the factor is testing simultaneously H0:β1=β2=0, i.e., the hypothesis that the factor in general is not significant. A common strategy is to first test this omnibus hypothesis before digging down which of the levels of the factor are different from each other.

Also, you can use the anova() function for full versus reduced model tests. For example:

```
> x <- data.frame(rand=rnorm(100), factor=sample(c("A","B","C"),100,replace=TRUE),
y1=rnorm(100), y2=rnorm(100))
> model1 <- lm(x$rand ~ x$factor + x$y1 + x$y2)
> model2 <- lm(x$rand ~ x$factor)
> anova(model2, model1)
Analysis of Variance Table
Model 1: x$rand ~ x$factor
Model 2: x$rand ~ x$factor + x$y1 + x$y2
Res.Df RSS Df Sum of Sq F Pr(>F)
1 97 105.06
2 95 104.92 2 0.13651 0.0618 0.9401
```

which is a comparison of the full model with the factor and two covariates (y1 and y2) and the reduced model, where we assume that the slopes of the two covariates are both simultaneously equal to zero.

**Attribution***Source : Link , Question Author : Alexander Engelhardt , Answer Author : Wolfgang*