[R] glm.nb versus glm estimation of theta.

hesicaia dboyce at dal.ca
Wed Aug 19 22:26:26 CEST 2009




Bill.Venables wrote:
> 
> Whoa!  Just hang on a minute.
> 
> theta is NOT the dispersion parameter.   Under the NB model, the variance
> of an observation is mu+mu^2/theta, so that's how theta enters the
> picture.  The smaller theta is the larger the variance.
> 
> glm(..., family = negative.binomial(theta = <something>), ...)  
> 
> will NOT estimate theta.  It will estimate a dispersion parameter, and if
> you get your <something> wrong, it could be a silly estimate.  One would
> hope the estimate of the dispersion parameter would be close to unity.
> 
> With your data try
> 
> mod2 <- glm(count ~ year + season + time + depth,
>   family = negative.binomial(theta=mod$theta),link = "log",
>   data = dat, control = glm.control(maxit=100, trace=T))
> 
> You should get the same estimates as you got with the negative binomial
> model (though standard errors, &c, will differ because you have cheated on
> that), and your dispersion parameter should be close to (though not
> necessarily equal to) unity.
> ________________________________________
> From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On
> Behalf Of hesicaia [dboyce at dal.ca]
> Sent: 14 August 2009 04:31
> To: r-help at r-project.org
> Subject: [R]  glm.nb versus glm estimation of theta.
> 
> Hello,
> 
> I have a question regarding estimation of the dispersion parameter (theta)
> for generalized linear models with the negative binomial error structure.
> As
> I understand, there are two main methods to fit glm's using the nb error
> structure in R: glm.nb() or glm() with the negative.binomial(theta)
> family.
> Both functions are implemented through the MASS library. Fitting the model
> using these two functions to the same data produces much different results
> for me in terms of estimated theta and the coefficients, and I am not sure
> why.
> 
> the following model:
> mod<-glm.nb(count ~ year + season + time + depth,
> link="log",data=dat,control=glm.control(maxit=100,trace=T))
> estimates theta as 0.0109
> 
> while the following model:
> mod2<-glm(count ~ year + season + time + depth,
> family=negative.binomial(theta=100),link="log",data=dat,control=glm.control(maxit=100,trace=T))
> will not accept 0.0109 as theta and instead estimates it as 1271 (these
> are
> fisheries catch data and so are very overdispersed).
> 
> Fitting a quasipoisson model also yields a large dispersion parameter
> (1300). The models also produce different coefficients and P-values, which
> is disconcerting.
> 
> What am I doing wrong here? I've read through the help sections
> (?negative.binomial,?glm.nb, and ?glm) but did not find any answers.
> 
> Any help and/or input is greatly appreciated!
> Daniel.
> --
> View this message in context:
> http://www.nabble.com/glm.nb-versus-glm-estimation-of-theta.-tp24956438p24956438.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
  

Thanks to both respondents for clearing that up for me. I hope it's OK if I
ask a follow up question:
 
The first model sets the dispersion parameter to one and estimates theta:
mod<-glm.nb(count ~ year + season + time +
depth,link="log",data=dat,control=glm.control(maxit=100,trace=T))

The second model requires theta to be specified but then estimates the
dispersion parameter? 
mod2<-glm(count ~ year + season + time + depth,
family=negative.binomial(theta=mod$theta),link="log",data=dat,control=glm.control(maxit=100,trace=T))

Since both theta and the dispersion parameter contribute to the
specification of the mean-variance relationship, shouldn't these two
estimation techniques be equivalent? 
VAR[y] = dispersion * V(mu)
 V(mu) = mu + 1/theta * mu2
 The parameter estimates from these two models are roughly equivalent but
the standard errors and thus statistical significance are much different
thus affecting covariate selection. The estimated dispersion for mod2 was
9.7 and very few covariates were statistically significant, while for mod,
almost all covariates were statistically significant. I was hoping someone
could shed some light on this for me. Am I on the right track here? Which
approach will yield the most accurate estimates and standard errors?

Thanks again,
Dan.

-- 
View this message in context: http://www.nabble.com/glm.nb-versus-glm-estimation-of-theta.-tp24956438p25051352.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list