[R] Overdispersion in count data
David Winsemius
dwinsemius at comcast.net
Thu Apr 3 03:24:23 CEST 2008
"Wade Wall" <wade.wall at gmail.com> wrote in
news:e23082be0804021244q57e359e8ic724b6f619f90153 at mail.gmail.com:
> Thanks for the recommendations, insights. I tried using glm.nb, but
> it didn't seem to like my data. I received the message (subscript)
> logical subscript too long. I am using the same dataframe as my
> previous glm. Do you know if I need to put the data in a different
> format?
I was wondering about your data layout. You said you had the flower/no-
flower data in two different columns. That is not the way I usually
offer data to glm(). I would have imagined that log(burn_time) would
have been an offset. It might help if you at least offered the audience
a sample of ten rows, the results of str() for the data.frame, and the
call to the glm function.
--
David Winsemius
> On Wed, Apr 2, 2008 at 12:31 PM, Gavin Simpson
> <gavin.simpson at ucl.ac.uk> wrote:
>
>> On Wed, 2008-04-02 at 12:03 -0400, Wade Wall wrote:
>> > Hi all,
>> >
>> > I have count data (number of flowering individuals plus total
>> > number of individuals) across 24 sites and 3 treatments (time
>> > since last burn). Following recommendations in the R Book, I used
>> > a glm with the model y~ burn, with y being two columns
>> > (flowering, not flowering) and burn the time (category) since
>> > burn. However, the residual deviance is roughly 10 times
>> > the number of degrees of freedom, and using the quasibinomial
>> > distribution doesn't change this. Any suggestions as to why the
>> > quasibinomial distribution doesn't change the residual deviance
>> > and how I should proceed.
>>
>> > I know that this level of residual deviance is unacceptable, but
>> > not sure is transformations are in order.
>> The quasi families estimate the dispersion parameter rather than
>> assume it is fixed. This doesn't change the estimates for the
>> coefficients, but it may change their standard errors if the
>> estimated dispersion parameter is different from 1, and hence the
>> test statistics and their p-values. As such the residual deviance
>> doesn't change, you are just adjusting the interpretation of
>> coefficients to take account of the over-dispersion.
>>
>> If you are not happy with the fitted model there are numerous
>> options you could try, including fitting a negative binomial (NB)
>> GLM (see glm.nb() in package MASS) or a zero-inflated Poisson or NB
>> model or a Hurdle model. Functions to fit the ZIP/ZINB or Hurdle
>> models can be found in the pscl package.
>>
More information about the R-help
mailing list