How can Poisson GLM work with non-count data (rate data)? [duplicate]

My question is related, but not the same as the following question: Fitting a Poisson GLM in R – issues with rates vs. counts

Here’s some fake data:

### some fake data
x=c(1:14)
y=c(0,  1,  2,  3,  1,  4,  9, 18, 23, 31, 20, 25, 37, 45)
y_rate <- y / 1000

I’m going to use a Poisson GLM with a log link to predict y_rate:

### model
pois_mdl <- glm(y_rate ~ x, family=poisson(link="log"))
summary(pois_mdl)

Plot the fit:

### plot
plot(x, y_rate)
lines(x, pois_mdl$fitted.values)

I am surprised that Poisson glm() allows for non-integer values in the dependent variable. Draws from a Poisson distribution are always integers (regardless of the value of the mean parameter). Why doesn’t glm() blow up?

Answer

I don’t know why glm() doesn’t blow up. To figure that out, you’ll have to unpack all of the underlying code. (In addition, if your only question is how the R code works, this question is off topic here.)

What I can say is that you are not modeling the rates correctly. If you want to model rates instead of counts, you need to include an offset in the model’s formula. (There is a nice discussion on CV of what an offset is here: When to use an offset in a Poisson regression?) Using your example the code would be:

pois_mdl2 <- glm(y~x+offset(log(rep(1000,14))), family=poisson(link="log"))

Note that, although the coefficient estimates are the same, the standard errors are quite different:

summary(pois_mdl2)$coefficients
#               Estimate Std. Error   z value      Pr(>|z|)
# (Intercept) -6.5681214 0.25118701 -26.14833 1.029521e-150
# x            0.2565236 0.02203911  11.63947  2.596237e-31
summary(pois_mdl)$coefficients
#               Estimate Std. Error    z value  Pr(>|z|)
# (Intercept) -6.5681214  7.9431516 -0.8268911 0.4082988
# x            0.2565236  0.6969324  0.3680753 0.7128171

Attribution
Source : Link , Question Author : William Chiu , Answer Author : Community

Leave a Comment