How to deal with overdispersion in Poisson regression: quasi-likelihood, negative binomial GLM, or subject-level random effect?

I’ve come across three proposals to deal with overdispersion in a Poisson response variable and an all fixed-effects starting model:

  1. Use a quasi model;
  2. Use negative binomial GLM;
  3. Use a mixed model with a subject-level random effect.

But which to actually choose, and why? Is there any actual criterion among these?


Poisson regression is just a GLM:

People often speak of the parametric rationale for applying Poisson regression. In fact, Poisson regression is just a GLM. That means Poisson regression is justified for any type of data (counts, ratings, exam scores, binary events, etc.) when two assumptions are met: 1) the log of the mean-outcome is a linear combination of the predictors and 2) the variance of the outcome is equal to the mean. These two conditions are respectively referred to as the mean-model and the mean-variance relationship.

The mean-model assumption can be relaxed somewhat by using a complex set of adjustments for predictors. This is nice because the link function affects the interpretation of the parameters; the subtlety of interpretation makes the difference between answering a scientific question and completely eluding the consumers of your statistical analysis. In another SE post I discuss the usefulness of log-transforms for interpretation.

It turns out, however, that the second assumption (mean-variance relationship) has strong implications on inference. When the mean-variance relationship is not true, the parameter estimates are not biased. However, the standard errors, confidence intervals, p-values, and predictions are all miscalibrated. That means you cannot control for Type I error and you may have suboptimal power.

What if the mean-variance could be relaxed so that the variance is simply proportional to the mean? Negative binomial regression and Quasipoisson regression do this.

Quasipoisson models

Quasipoisson models are not likelihood based. They maximize a “quasilikelihood” which is a Poisson likelihood up to a proportional constant. That proportional constant happens to be the dispersion. The dispersion is considered a nuisance parameter. While the maximization routine comes up with an estimate of the nuisance parameter, that estimate is merely an artifact of the data rather than any value which generalizes to the population. The dispersion only serves to “shrink” or “widen” the SEs of the regression parameters according to whether the variance is proportionally smaller than or larger than the mean. Since the dispersion is treated as a nuisance parameter, quasipoisson models enjoy a host of robust properties: the data can in fact be heteroscedastic (not meeting the proportional mean-variance assumption) and even exhibit small sources of dependence, and the mean model need not be exactly correct, but the 95% CIs for the regression parameters are asymptotically correct. If your goal of the data analysis is to measure the association between a set of regression parameters and the outcome, quasipoisson models are usually the way to go. A limitation of these models is that they cannot yield prediction intervals, the Pearson residuals cannot tell you much about how accurate the mean model is, and information criteria like the AIC or BIC cannot effectively compare these models to other types of models.

Negative binomial models

It’s most useful to understand negative binomial regression as a 2-parameter Poisson regression. The mean model is the same as in Poisson and Quasipoisson models where the log of the outcome is a linear combination of predictors. Furthermore, the “scale” parameter models a mean-variance relationship where the variance is merely proportional to the mean as before. However, unlike quasipoisson models, this type of model is an exact likelihood based procedure. In this case the dispersion is an actual parameter which has some extent of generalizability to the population. This introduces a few advantages over quasipoisson but, in my opinion, imposes more (untestable) assumptions. Unlike quasipoisson models: the data must be independent, the mean model must be correct, and the scale parameter must be homoscedastic across the range of fitted values to obtain correct inference. However, these can be assessed somewhat by inspecting Pearson residuals, and the model produces viable prediction and prediction intervals, and is amenable to comparison with information criteria.

Negative binomial probability models arise from a Poisson-Gamma mixture. That is, there is an unknown fluctuating Gamma random variable “feeding into” the Poisson rate parameter. Since NB GLM fitting is likelihood based, it is usually helpful to state prior beliefs about the data generating mechanism and connect them to the probabilistic rationale for the model at hand. For instance, if I am testing number of racers retiring from 24-hour endurance racing, I might consider that the environmental conditions are all stressors that I did not measure and thus contribute to the risk of DNF, such as moisture or cold temperature affecting tire traction and thus the risk of a spin-out and wreck.

Models for dependent data: GLMMs vs GEE

Generalized linear mixed models (GLMMs) for Poisson data do not compare with the above approaches. GLMMs answer a different question and are used in different data structures. Here sources of dependence between data are measured explicitly. GLMMs make use of random intercepts and random slopes to account for individual level heterogeneity. This modifies what we estimate. The random effects modify the mean and the variance that is modeled rather than just the variance as was discussed above.

There are two possible levels of association which can be measured in dependent data: population level (marginal) and individual level (conditional). GLMMs claim to measure individual level (conditional) associations: that is, given the whole host of individual level contributors to the outcome, what is the relative effect of a combination of predictors. As an example, exam prep courses may be of little effect to children who attend exemplary schools, whereas inner city children may benefit tremendously. The individual level effect is then substantially higher in this circumstance since advantaged children are too far above the curve in terms of positive exposures.

If we naively applied quasipoisson or negative binomial models to dependent data, the NB models would be wrong, and the Quasipoisson models would be inefficient. The GEE, however, extends the quasipoisson model to explicitly model dependence structures like the GLMM, but the GEE measures an marginal (population level) trend and obtains the correct weights, standard errors, and inference.

Data analysis example:

This post is already too long 🙂 There is a nice illustration of the first two models in this tutorial, along with references to more reading if you are interested. The data in question involve the nesting habits of horseshoe crabs: females sit in nests and males (satellites) attach to her. The investigators wanted to measures the number of males attached to a female as a function of the female’s characteristics. I hope I’ve underscored why mixed models are noncomparable: if you have dependent data, you must use the correct model for the question those dependent data are trying to answer, either a GLM or a GEE.


[1] Agresti, Categorical Data Analysis 2nd Edition

[2] Diggle, Heagerty, Liang, Zeger, Analysis of Longitudinal Data 2nd ed.

Source : Link , Question Author : Bryan , Answer Author : AdamO

Leave a Comment