[R] glm.nb versus glm estimation of theta.
Achim.Zeileis at wu-wien.ac.at
Fri Aug 14 01:35:57 CEST 2009
On Thu, 13 Aug 2009, hesicaia wrote:
> I have a question regarding estimation of the dispersion parameter (theta)
> for generalized linear models with the negative binomial error structure. As
The theta is different from the dispersion. In the usual GLM notation:
E[y] = mu
VAR[y] = dispersion * V(mu)
The function V() depends on the family and is
Poisson: V(mu) = mu
NB: V(mu) = mu + 1/theta * mu^2
For both models, dispersion is known to be 1 (from the likelihood).
However, a quasi-Poisson approach can be adopted where dispersion is
estimated (but does not correspond to a specific likelihood).
Thus, dispersion and theta are really different things although both of
them can be used to capture overdispersion.
> I understand, there are two main methods to fit glm's using the nb error
> structure in R: glm.nb() or glm() with the negative.binomial(theta) family.
> Both functions are implemented through the MASS library. Fitting the model
> using these two functions to the same data produces much different results
> for me in terms of estimated theta and the coefficients, and I am not sure
...because you tell them to do different things.
> the following model:
> mod<-glm.nb(count ~ year + season + time + depth,
> estimates theta as 0.0109
This approach estimates theta (= 0.0109) and assumes that dispersion is
known to be 1. The underlying estimated variance function is:
VAR[y] = mu + 91.74312 * mu^2
> while the following model:
> mod2<-glm(count ~ year + season + time + depth,
> will not accept 0.0109 as theta and instead estimates it as 1271 (these are
> fisheries catch data and so are very overdispersed).
This does not estimate theta at all but keeps it fixed (= 100). By
default, however, dispersion will be estimated by the summary() method,
presumably leading to the value of 1271 you report. The underlying
variance function would then be
VAR[y] = 1271 * mu + 12.71 * mu^2
> Fitting a quasipoisson model also yields a large dispersion parameter
> (1300). The models also produce different coefficients and P-values, which
> is disconcerting.
This implies yet another variance function, namely
VAR[y] = 1300 * mu
If you want to get essentially the same result as
from using glm+negative.binomial you can do
mod0 <- glm(count ~ year + season + time + depth, data = dat,
family = negative.binomial(theta = mod$theta),
control = glm.control(maxit = 100))
summary(mod0, dispersion = 1)
(Note that link = "log" is not needed.)
> What am I doing wrong here? I've read through the help sections
> (?negative.binomial,?glm.nb, and ?glm) but did not find any answers.
I guess the authors of the "MASS" package would say that the software
accompanies a book which should be consulted...and they would be right.
Reading the corresponding sections in MASS (the book) will surely be
helpful. Consulting McCullagh & Nelder about some general GLM theory can't
hurt either. Finally, some extended modeling (including excess zeros) is
(Apologies for advertising this twice on the same day.)
> Any help and/or input is greatly appreciated!
> View this message in context: http://www.nabble.com/glm.nb-versus-glm-estimation-of-theta.-tp24956438p24956438.html
> Sent from the R help mailing list archive at Nabble.com.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help