I am carrying out a Poisson regression with the end goal of comparing (and taking the difference of) the predicted mean counts between two factor levels in my model: ˆμ1−ˆμ2, while holding other model covariates (which are all binary) constant. I was wondering if anyone could provide some practical advice on when to use a log link versus an identity link. What are the pros can cons of these two different link functions in Poisson regression, given my goal of comparing differences?
I also have the same goal in mind for a logistic/binomial regression (to use a logit link or an identity link) to compare the difference in proportions between two factor levels and need similar advice. I’ve read some of the posts here that touch on this issue, but none seem to explain why or when one might chose one link over the other and what the pros/cons might be. Thanks in advance for your help!
I also realize that the main purpose of using certain links functions is to restrict the range range of possible predicted values to be within the range of the mean response (e.g. for logistic, the range is restricted to be between 0 and 1 and for the log link, the predictions are restricted to be positive numbers). So, I guess what I’m asking is that if I use an identity link for say a logistic/binomial regression, and my results are within the range (0,1), is there really any need to use a logistic link function or could I just make thinks simpler an use an identity link?
Cons of an identity link in the case of the Poisson regression are:
- As you have mentioned, it can produce out-of-range predictions.
- You may get weird errors and warnings when attempting to fit the model, because the link permits lambda to be less than 0, but the Poisson distribution is not defined for such values.
- As Poisson regression assumes that the mean and variance are the same, when you change the link you are also changing assumptions about the variance. My experience has been that this last point is most telling.
But, ultimately this is an empirical question. Fit both models. Perform whatever checks you like. If the identity link has a lower AIC, and does as well or better on all your other checks, then run with the identity link.
In the case of the logit model vs the linear probability model (i.e., what you refer to as the identity link), the situation is a lot more straightforward. Except for some very exotic cases in econometrics (which you will find if you do a search), the logit model is better: it makes fewer assumptions and is what most people use. Using the linear probability model in its place would verge on being perverse.
As regards interpreting the models, if you are using R, there are two great packages that will do all the heavy lifting: effects, which is super easy to use, and zelig, which is harder to use but great if you want to make predictions.