# What is the need of assumptions in linear regression?

In linear regression, we make the following assumptions

• The mean of the response,
$E(Y_i)$, at each set of values of the predictors, $(x_{1i}, x_{2i},…)$, is a Linear function of the predictors.
• The errors, $ε_i$, are Independent.
• The errors, $ε_i$, at each set of values of the predictors, $(x_{1i}, x_{2i},…)$, are Normally distributed.
• The errors, $ε_i$, at each set of values of the predictors, $(x_{1i}, x_{2i},…)$, have Equal variances (denoted $σ2$).
• One of the ways we can solve linear regression is through normal equations, which we can write as

From a mathematical standpoint, the above equation only needs $X^TX$ to be invertible. So, why do we need these assumptions? I asked a few colleagues and they mentioned that it is to get good results and normal equations are an algorithm to achieve that. But in that case, how do these assumptions help? How does upholding them help in getting a better model?

You are correct – you do not need to satisfy these assumptions to fit a least squares line to the points. You need these assumptions to interpret the results. For example, assuming there was no relationship between an input $X_1$ and $Y$, what is the probability of getting a coefficient $\beta_1$ at least as great as what we saw from the regression?