Can anyone please shed some light on the relationship between OLS and generalised linear model?
Has it to do with the distribution of the error terms, general linear model requires normality in the distribution of the errors (foundation of least squares) while generalised linear models dont have the same assumptions?
Have I missed something?
In the context of generalized linear models (GLMs), OLS is viewed as a special case of GLM. Under this framework, the distribution of the OLS error terms is normal (gaussian) and the link function is the identity function.
Generalized linear models allow for different error distributions and also allow the dependent (or response) variable to have a different relationship with the independent variables. This allows for modelling counts or binary or multinomial outcomes. This relationship is encoded in the link function.
Below is an example using R to show that OLS is a special case of GLM:
# create data x <- 1:20 y <- 2*x + 3 + rnorm(20) # OLS lm(y~x)
lm(formula = y ~ x)
# GLM glm(y~x, family=gaussian(identity))
Call: glm(formula = y ~ x, family = gaussian(identity))
Degrees of Freedom: 19 Total (i.e. Null); 18 Residual
Null Deviance: 2717
Residual Deviance: 28.98 AIC: 70.18
It is important to note that OLS can also be viewed mathematically as the linear projection of the dependent variable onto the independent variables in a manner that minimizes the squared distance from the projection to the observations. From this bare-bones viewpoint, the assumption of normality in the conditional error term is irrelevant.
However, to make statistical inferences, assumptions must be added. Either normality of the error term, or more typically, the central limit theorem are invoked for the purpose of inference following estimation of an OLS model.