# Residuals in a linear model are independent but sum to zero; isn’t it a contradiction? [duplicate]

• The sum of the residuals in a linear model equals zero.
• The residuals in a linear model are independent.

The question appears to confuse two meanings of “residual.”

• The first bullet refers to the differences between the data and their fitted values.

• The second bullet refers to a collection of random variables that are used to model the differences between the data and their expectations.

This might become clearer upon examining the simplest possible example: estimating the mean of a population, $\mu$, by taking two independent observations from it (with replacement). The data can be modeled by an ordered pair of random variables $(X_1, X_2)$. The “fitted values” are the estimated mean,

$$\bar X = (X_1 + X_2)/2.$$

This number is the fit for each of the two observations.

• The residuals are the differences between the data and the fit. They consist of the ordered pair $$(e_1, e_2) = (X_1 – \bar X, X_2 – \bar X) = ((X_1-X_2)/2, -(X_1-X_2)/2).$$ Consequently $e_2 = -e_1$, showing the residuals are dependent.

• An alternative model of these data uses the random variables $$(\epsilon_1, \epsilon_2) = (X_1 – \mu, X_2 – \mu).$$ Often these random variables are called “errors” but sometimes they are also called “residuals.” Since the $X_i$ are independent, and $\mu$ is just some constant, the $\epsilon_i$ are also independent.

It might be of interest to note that $e_1 + e_2 = 0$ whereas $\mathbb{E}(\epsilon_1) = \mathbb{E}(\epsilon_2) = 0$. The former is a true dependence among random variables whereas the latter is merely a constraint concerning the underlying model.