[R-sig-ME] overdispersion with binomial data?

Sat Feb 12 19:09:48 CET 2011

Although the idea that binary data cannot be overdispersed by 
definition sounds reasonable, in fact this means little.

Consider a grouped data study with each group having an n and x 
corresponding to trials and successes in the group. This leads to 
overdispersion typically, because of positive correlation in the group.

New "explode" the groups into individual binary data, with n such 
data for each group and x success rows and n-x failure rows. The 
resulting binary cannot "by definition" be overdispersed.

This is, however, just a pea-in-shell game. The overdispersion in the 
first dataset is now clustering in the second dataset. The cluster 
variable is "group". The same effect is there, just as a different 
term in the model.

Including an "observation" variable to deal with overdispersion is 
equivalent to adding the same clustering variable in the binary dataset.

"What's in a name? That which we call a rose by any other name would 
smell as sweet."

"There is no such thing as a free lunch."

At 08:00 AM 2/12/2011, Jarrod Hadfield wrote:
>Hi Colin,
>
>I have little to add over what John Maindonald said, but I see your
>second question regarding my suggestions for binary/binomial data was
>not answered. In most studies I think binomial data will be
>over-dispersed and adding an observation-level random effect can be a
>good way of modeling this.  You can think of the n trials of a
>binomial observation as a group of n correlated binary variables. The
>variance associated with the observation-level term essentially
>estimates how strong this correlation is (after accounting for other
>fixed/random effects in the model). If the original data are already
>binary then n=1 and there can be no correlation, and so
>over-dispersion with binary data cannot exist.
>
>Cheers,
>
>Jarrod
>
>
>
>
>
>
>
>
>Quoting Colin Wahl <biowahl at gmail.com>:
>
>>In anticipation of the weekend:
>>In my various readings(crawley, zuur, bolker's ecological models book, and
>>the GLMM_TREE article, reworked supplementary material and R help posts) the
>>discussion of overdispersion for glmm is quite convoluted by different
>>interpretations, different ways to test for it, and different solutions to
>>deal with it. In many cases differences seem to stem from the type of data
>>being analyzed (e.g. binomial vs. poisson) and somewhat subjective options
>>for which type of residuals to use for which models.
>>
>>The most consistent definition I have found is overdispersion is defined by
>>a ratio of residual scaled deviance to the residual degrees of freedom > 1.
>>
>>Which seems simple enough.
>>>modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
>>family=binomial(link="logit"))
>>>rdev <- sum(residuals(modelBQ)^2)
>>>mdf <- length(fixef(modelBQ))
>>>rdf <- nrow(ept)-mdf
>>>rdev/rdf #9.7 >>1
>>
>>So I conclude my model is overdispersed. The recent consensus solution seems
>>to be to create and add a individual level random variable to the model.
>>
>>ept$obs <- 1:nrow(ept) #create individual level random variable 1:72
>>modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
>>data=ept, family=binomial(link="logit"))
>>
>>I take a look at the residuals which are now much smaller but are... just...
>>too... good... for my ecological (glmm free) experience to be comfortable
>>with. Additionally, they fit better for intermediate data, which, with
>>binomial errors is the opposite of what I would expect. Feel free to inspect
>>them in the attached image (if attachments work via mail list... if not, I
>>can send it directly to whomever is interested).
>>
>>Because it looks too good... I test overdispersion again for the new model:
>>
>>rdev/rdf #0.37
>>
>>Which is terrifically underdispersed, for which the consensus is to ignore
>>it (Zuur et al. 2009).
>>
>>So, for my questions:
>>1. Is there anything relevant to add to/adjust in my approach thus far?
>>2. Is overdispersion an issue I should be concerned with for binomial
>>errors? Most sources think so, but I did find a post from Jerrod Hadfield
>>back in august where he states that overdispersion does not exist with a
>>binary response variable:
>>http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in
>>subsequent posts he recommends the approach I have taken by using an
>>individual level random variable).
>>3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F
>>tests instead of Z or X^2 tests to get p values because they "account for
>>the uncertainty in the estimates of overdispersion." That seems like a nice
>>simple option, I have not seen this come up in any other readings. Thoughts?
>>
>>
>>
>>
>>Here are the glmer model outputs:
>>
>>ModelB
>>Generalized linear mixed model fit by the Laplace approximation
>>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip)
>>    Data: ept
>>    AIC BIC logLik deviance
>>  754.3 777 -367.2    734.3
>>Random effects:
>>  Groups     Name        Variance Std.Dev.
>>  stream:rip (Intercept) 0.48908  0.69934
>>  stream     (Intercept) 0.18187  0.42647
>>Number of obs: 72, groups: stream:rip, 24; stream, 12
>>
>>Fixed effects:
>>             Estimate Std. Error z value Pr(>|z|)
>>(Intercept) -4.28529    0.50575  -8.473  < 2e-16 ***
>>wshd        -2.06605    0.77357  -2.671  0.00757 **
>>wshf         3.36248    0.65118   5.164 2.42e-07 ***
>>wshg         3.30175    0.76962   4.290 1.79e-05 ***
>>ripN         0.07063    0.61930   0.114  0.90920
>>wshd:ripN    0.60510    0.94778   0.638  0.52319
>>wshf:ripN   -0.80043    0.79416  -1.008  0.31350
>>wshg:ripN   -2.78964    0.94336  -2.957  0.00311 **
>>
>>ModelBQ
>>
>>Generalized linear mixed model fit by the Laplace approximation
>>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs)
>>    Data: ept
>>    AIC   BIC logLik deviance
>>  284.4 309.5 -131.2    262.4
>>Random effects:
>>  Groups     Name        Variance Std.Dev.
>>  obs        (Intercept) 0.30186  0.54942
>>  stream:rip (Intercept) 0.40229  0.63427
>>  stream     (Intercept) 0.12788  0.35760
>>Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>>
>>Fixed effects:
>>             Estimate Std. Error z value Pr(>|z|)
>>(Intercept)  -4.2906     0.4935  -8.694  < 2e-16 ***
>>wshd         -2.0557     0.7601  -2.705  0.00684 **
>>wshf          3.3575     0.6339   5.297 1.18e-07 ***
>>wshg          3.3923     0.7486   4.531 5.86e-06 ***
>>ripN          0.1425     0.6323   0.225  0.82165
>>wshd:ripN     0.3708     0.9682   0.383  0.70170
>>wshf:ripN    -0.8665     0.8087  -1.071  0.28400
>>wshg:ripN    -3.1530     0.9601  -3.284  0.00102 **
>>
>>
>>Cheers,
>>--
>>Colin Wahl
>>Department of Biology
>>Western Washington University
>>Bellingham WA, 98225
>>ph: 360-391-9881
>
>
>
>--
>The University of Edinburgh is a charitable body, registered in
>Scotland, with registration number SC005336.
>
>_______________________________________________
>R-sig-mixed-models at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"