# Logic behind the ANOVA F-test in simple linear regression

I’m trying to understand the logic behind the ANOVA F-test in Simple Linear Regression Analysis. The question I have is like follows. When the F value, i.e.
MSR/MSE is large we accept the model as significant. What is the logic behind this?

In the simplest case, when you have only one predictor (simple regression), say $X_1$, the $F$-test tells you whether including $X_1$ does explain a larger part of the variance observed in $Y$ compared to the null model (intercept only). The idea is then to test if the added explained variance (total variance, TSS, minus residual variance, RSS) is large enough to be considered as a “significant quantity”. We are here comparing a model with one predictor, or explanatory variable, to a baseline which is just “noise” (nothing except the grand mean).

Likewise, you can compute an $F$ statistic in a multiple regression setting: In this case, it amounts to a test of all predictors included in the model, which under the HT framework means that we wonder whether any of them is useful in predicting the response variable. This is the reason why you may encounter situations where the $F$-test for the whole model is significant whereas some of the $t$ or $z$-tests associated to each regression coefficient are not.

The $F$ statistic looks like

$$F = \frac{(\text{TSS}-\text{RSS})/(p-1)}{\text{RSS}/(n-p)},$$

where $p$ is the number of model parameters and $n$ the number of observations. This quantity should be referred to an $F_{p-1,n-p}$ distribution for a critical or $p$-value. It applies for the simple regression model as well, and obviously bears some analogy with the classical ANOVA framework.

Sidenote.
When you have more than one predictor, then you may wonder whether considering only a subset of those predictors “reduces” the quality of model fit. This corresponds to a situation where we consider nested models. This is exactly the same situation as the above ones, where we compare a given regression model with a null model (no predictors included). In order to assess the reduction in explained variance, we can compare the residual sum of squares (RSS) from both model (that is, what is left unexplained once you account for the effect of predictors present in the model). Let $\mathcal{M}_0$ and $\mathcal{M}_1$ denote the base model (with $p$ parameters) and a model with an additional predictor ($q=p+1$ parameters), then if $\text{RSS}_{\mathcal{M}_1}-\text{RSS}_{\mathcal{M}_0}$ is small, we would consider that the smaller model performs as good as the larger one. A good statistic to use would the ratio of such SS, $(\text{RSS}_{\mathcal{M}_1}-\text{RSS}_{\mathcal{M}_0})/\text{RSS}_{\mathcal{M}_0}$, weighted by their degrees of freedom ($p-q$ for the numerator, and $n-p$ for the denominator). As already said, it can be shown that this quantity follows an $F$ (or Fisher-Snedecor) distribution with $p-q$ and $n-p$ degrees of freedom. If the observed $F$ is larger than the corresponding $F$ quantile at a given $\alpha$ (typically, $\alpha=0.05$), then we would conclude that the larger model makes a “better job”. (This by no means implies that the model is correct, from a practical point of view!)

A generalization of the above idea is the likelihood ratio test.

If you are using R, you can play with the above concepts like this:

df <- transform(X <- as.data.frame(replicate(2, rnorm(100))),
y = V1+V2+rnorm(100))
## simple regression
anova(lm(y ~ V1, df))         # "ANOVA view"
summary(lm(y ~ V1, df))       # "Regression view"
## multiple regression
summary(lm0 <- lm(y ~ ., df))
lm1 <- update(lm0, . ~ . -V2) # reduced model
anova(lm1, lm0)               # test of V2