[R] comparing levels of aggregation with negative binomial models

Tue Nov 16 00:50:29 CET 2010

Ben Raymond <Ben.Raymond <at> rhul.ac.uk> writes:

> 
> Dear R community,
> 
> I would like to compare the degree of aggregation (or dispersion) of  
> bacteria isolated from plant material.  My data are discrete counts  
> from leaf washes.  While I do have xy coordinates for each plant, it  
> is aggregation in the sense of the concentration of bacteria in high  
> density patches that I am interested in.
> My attempt to analyze this was to fit negative binomial glms to each  
> of my leaf treatments (using MASS) and to compare estimates of theta  
> and use the standard errors to calculate confidence limits.   My  
> values of theta (se) were 0.387 (0.058) and 0.1035 (0.015) which were  
> in the right direction for my hypothesis.  However, some of the stats  
> literature suggests that the confidence intervals of theta (or k) are  
> not very robust and it would be better to calculate confidence  
> intervals for 1/k.  Is there a way I can estimate confidence intervals  
> for 1/k in R, or indeed a more elegant way of looking at aggregation?

   Slightly surprising, there isn't a really really easy way to do this,
but here's a suggestion. The mle2() function in the bbmle package can
fit negative binomial models (although a bit less efficiently and
stably than glm.nb in the MASS package); it allows you to construct
a profile confidence interval for the k parameter (profile confidence
intervals are invariant to transformation of the parameter, so you'd
get the same answer if you wrote a model with 1/k as the parameter).

library(MASS)

quine.nb1 <- glm.nb(Days ~ Sex/(Age + Eth*Lrn), data = quine)
summary(quine.nb1)
confint(quine.nb1) ## only gives CIs on fixed effects

library(bbmle)
npar <- length(coef(quine.nb1))
quine.nb1M <- mle2(Days~dnbinom(mu=exp(logmu),size=k),
                   parameters=list(logmu~Sex/(Age+Eth*Lrn)),
                   data=quine,
                   start=list(logmu=1,k=1.6),
                   lower=c(rep(-Inf,npar),0),
                   method="L-BFGS-B")

all.equal(coef(quine.nb1),coef(quine.nb1M)[1:npar])
## mean relative difference: 1.7e-4
pp <- profile(quine.nb1M,which="k")
plot(pp)
confint(pp)
confint(quine.nb1M,method="quad",par="k")
1.59799+c(-1,1)*1.96*0.213 ## based on summary(quine.nb1)

  The profile shows that the quadratic approximation (equivalent
to the way that glm.nb is estimating the standard error of the
parameter) is pretty good in this case, but you can see how
it works for you data.
   It would be more efficient to write a profiling function that
wrapped glm.nb, but as far as I know that hasn't been done.
   Another alternative would be to fit a lognormal-Poisson model
in glmer (lme4 package), but that's opening another can of worms ...