[R] Overdispersion in count data

David Winsemius dwinsemius at comcast.net
Thu Apr 3 03:24:23 CEST 2008

"Wade Wall" <wade.wall at gmail.com> wrote in
news:e23082be0804021244q57e359e8ic724b6f619f90153 at mail.gmail.com: 

> Thanks for the recommendations, insights.  I tried using glm.nb, but
> it didn't seem to like my data.  I received the message (subscript)
> logical subscript too long.  I am using the same dataframe as my
> previous glm.  Do you know if I need to put the data in a different
> format? 

I was wondering about your data layout. You said you had the flower/no-
flower data in two different columns. That is not the way I usually 
offer data to glm(). I would have imagined that log(burn_time) would 
have been an offset. It might help if you at least offered the audience 
a sample of ten rows, the results of str() for the data.frame, and the 
call to the glm function.

David Winsemius

> On Wed, Apr 2, 2008 at 12:31 PM, Gavin Simpson
> <gavin.simpson at ucl.ac.uk> wrote:
>> On Wed, 2008-04-02 at 12:03 -0400, Wade Wall wrote:
>> > Hi all,
>> >
>> > I have count data (number of flowering individuals plus total 
>> > number of individuals) across 24 sites and 3 treatments (time 
>> > since last burn). Following recommendations in the R Book, I used 
>> > a glm with the model y~ burn, with y being two columns 
>> > (flowering, not flowering) and burn the time (category) since 
>> > burn.  However, the residual deviance is roughly 10 times 
>> > the number of degrees of freedom, and using the quasibinomial 
>> > distribution doesn't change this.  Any suggestions as to why the 
>> > quasibinomial distribution doesn't change the residual deviance 
>> > and how I should proceed. 
>> > I know that this level of residual deviance is unacceptable, but 
>> > not sure is transformations are in order.

>> The quasi families estimate the dispersion parameter rather than
>> assume it is fixed. This doesn't change the estimates for the
>> coefficients, but it may change their standard errors if the
>> estimated dispersion parameter is different from 1, and hence the
>> test statistics and their p-values. As such the residual deviance
>> doesn't change, you are just adjusting the interpretation of
>> coefficients to take account of the over-dispersion.
>> If you are not happy with the fitted model there are numerous
>> options you could try, including fitting a negative binomial (NB)
>> GLM (see glm.nb() in package MASS) or a zero-inflated Poisson or NB
>> model or a Hurdle model. Functions to fit the ZIP/ZINB or Hurdle
>> models can be found in the pscl package.

More information about the R-help mailing list