What are the error distribution and link functions of a model family in R?

When building models with the glm function in R, one needs to specify the family. A family specifies an error distribution (or variance) function and a link function. For example, when I perform a logistic regression, I use the binomial(link = "logit") family.

What are (or represent) the error distribution (or variance) and link function in R ?

I assume that the link function is the type of model built (hence why using the logit function for the logistic regression. But I am not too sure about the error distribution function.

I had a look at R’s documentation but could not find detailed information other than how to use them and what parameters can be specified.

Answer

You don’t specify the “error” distribution, you specify the conditional distribution of the response.

When you type the name of the family (such as binomial) that specifies the conditional distribution to be binomial, and that implies the variance function (e.g. in the case of the binomial it is \mu(1-\mu)). If you choose a different family you get a different variance function (for Poisson it’s \mu, for Gamma it’s \mu^2, for Gaussian it’s constant, for inverse Gaussian its \mu^3, and so on).

[For some cases (e.g. logistic regression) you can take a latent-variable approach to the GLM – and in that case, you might possibly regard the distribution of the latent variable as a form of “error distribution”.]

The link function determines how the mean (\mu) and the linear predictor (\eta=X\beta) are related. Specifically, if \eta=g(\mu) then g is called the link function.

You can find tables of the variance functions and the canonical link functions (which have some convenient properties) for commonly-used members of the exponential class in many standard books as well as all over the place on the internet. Here’s a small one:

\begin{array}{lcll}
\textit{Family} & \textit{ Variance fn } & \textit{Canonical link function } & \textit{Other common links } \\
\hline
\text{Gaussian} & \text{constant} &\:\:\:\: \mu\qquad\qquad \text{(identity)} & \\
\text{Binomial} &\: \mu(1-\mu) & \log(\frac{\mu}{1-\mu})\;\qquad \:\:\:\,\text{(logit)} & \text{probit, cloglog} \\
\text{Poisson} &\: \mu &\: \log(\mu)\qquad\qquad\:\:\, \text{(log)} & \text{identity} \\
\text{Gamma} &\: \mu^2 &\:\: 1/\mu\quad\:\:\:\qquad \text{(inverse)} & \log \\
\text{Inverse Gaussian} &\: \mu^3 &\:\: 1/\mu^2 & \log
\end{array}

(R implements these in fairly typical fashion, and in the cases mentioned above will use the canonical link if you don’t specify one)

Attribution
Source : Link , Question Author : user5365075 , Answer Author : gung – Reinstate Monica

Leave a Comment