I conducted a glm.nb by
glm1<-glm.nb(x~factor(group))
with group being a categorial and x being a metrical variable. When I try to get the summary of the results, I get slightly different results, depending on if I use
summary()
orsummary.glm
.summary(glm1)
gives me... Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.1044 0.1519 0.687 0.4921 factor(gruppe)2 0.1580 0.2117 0.746 0.4555 factor(gruppe)3 0.3531 0.2085 1.693 0.0904 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Negative Binomial(0.7109) family taken to be 1)
whereas summary.glm(glm1) gives me
... Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.1044 0.1481 0.705 0.4817 factor(gruppe)2 0.1580 0.2065 0.765 0.4447 factor(gruppe)3 0.3531 0.2033 1.737 0.0835 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Negative Binomial(0.7109) family taken to be 0.9509067)
I understand the meaning of the dispersion parameter, but not of the line
(Dispersion parameter for Negative Binomial(0.7109) family taken to be 0.9509067)
.In the handbook it says, it would be the estimated dispersion, but it seems to be a bad estimate, as 0.95 is not close to 0.7109, or is the estimated dispersion something different than the estimated dispersion parameter?
I guess, I have to set the dispersion in thesummary.nb(x, dispersion=)
to something, but I’m not sure, if I have to set the dispersion to 1 (which will yield the same result assummary()
or if I should insert an estimate of the dispersion parameter, in this case leading tosummary.nb(glm1, dispersion=0.7109)
or something else? Or am I fine with just using thesummary(glm1)
?
Answer
Firstly, you should not use summary.glm
on an object of class "negbin"
. If you look at the function code for summary.glm
, right at the top you’ll see the computation of the dispersion
. Note that summary.glm
only knows about models that can be fitted by glm
and hence it singles out the binomial and Poisson families for special treatment, where the dispersion parameter $\phi$ is assumed to be equal to 1. For models other than these, $\phi$ is computed from the model object, but note that this is based on an assumption that this is appropriate for a family that is not binomial or Poisson. The family
for the model fitted by glm.nb
is "Negative Binomial(theta)"
. Hence when you use summary.glm
on the model fitted by glm.nb
, the in code
if (is.null(dispersion))
dispersion <- if (object$family$family %in% c("poisson",
"binomial"))
1
else if (df.r > 0) {
est.disp <- TRUE
if (any(object$weights == 0))
warning("observations with zero weight not used for calculating dispersion")
sum((object$weights * object$residuals^2)[object$weights >
0])/df.r
}
the test for "poisson"
or "binomial"
fails and it then computes $\phi$ where in actual fact it is assumed to be equal to 1 by default for this family (as per the definition of summary.negbin
.
There is no problem with this, it is just simpler to call the correct method and supply a different value for $\phi$ via argument dispersion
.
Secondly, you misunderstand the output. When you see
Negative Binomial(0.7109)
as I alluded to above, the number quoted in parentheses is $\hat{\theta}$, the parameter of the Negative Binomial distribution. This value is that estimated during fitting. It is not $\phi$, the dispersion parameter, and hence the two numbers should not necessarily be equal; they are just two numbers.
As the computed dispersion $\phi$ (following the code I quote above) is pretty close to one (~0.95), the assumption that $\phi = 1$ used for the standard errors is not too bad in summary.negbin
. You could of course, just do
summary(glm1, dispersion = 0.9509)
and get the additional output that the negbin
method gives you, plus the computed rather than assumed value of $\phi$.
Attribution
Source : Link , Question Author : Renoir Pulitz , Answer Author : Gavin Simpson