[R-sig-ME] overdispersion with binomial data?
Robert A LaBudde
ral at lcfltd.com
Sat Feb 12 19:09:48 CET 2011
Although the idea that binary data cannot be overdispersed by
definition sounds reasonable, in fact this means little.
Consider a grouped data study with each group having an n and x
corresponding to trials and successes in the group. This leads to
overdispersion typically, because of positive correlation in the group.
New "explode" the groups into individual binary data, with n such
data for each group and x success rows and n-x failure rows. The
resulting binary cannot "by definition" be overdispersed.
This is, however, just a pea-in-shell game. The overdispersion in the
first dataset is now clustering in the second dataset. The cluster
variable is "group". The same effect is there, just as a different
term in the model.
Including an "observation" variable to deal with overdispersion is
equivalent to adding the same clustering variable in the binary dataset.
"What's in a name? That which we call a rose by any other name would
smell as sweet."
"There is no such thing as a free lunch."
At 08:00 AM 2/12/2011, Jarrod Hadfield wrote:
>Hi Colin,
>
>I have little to add over what John Maindonald said, but I see your
>second question regarding my suggestions for binary/binomial data was
>not answered. In most studies I think binomial data will be
>over-dispersed and adding an observation-level random effect can be a
>good way of modeling this. You can think of the n trials of a
>binomial observation as a group of n correlated binary variables. The
>variance associated with the observation-level term essentially
>estimates how strong this correlation is (after accounting for other
>fixed/random effects in the model). If the original data are already
>binary then n=1 and there can be no correlation, and so
>over-dispersion with binary data cannot exist.
>
>Cheers,
>
>Jarrod
>
>
>
>
>
>
>
>
>Quoting Colin Wahl <biowahl at gmail.com>:
>
>>In anticipation of the weekend:
>>In my various readings(crawley, zuur, bolker's ecological models book, and
>>the GLMM_TREE article, reworked supplementary material and R help posts) the
>>discussion of overdispersion for glmm is quite convoluted by different
>>interpretations, different ways to test for it, and different solutions to
>>deal with it. In many cases differences seem to stem from the type of data
>>being analyzed (e.g. binomial vs. poisson) and somewhat subjective options
>>for which type of residuals to use for which models.
>>
>>The most consistent definition I have found is overdispersion is defined by
>>a ratio of residual scaled deviance to the residual degrees of freedom > 1.
>>
>>Which seems simple enough.
>>>modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
>>family=binomial(link="logit"))
>>>rdev <- sum(residuals(modelBQ)^2)
>>>mdf <- length(fixef(modelBQ))
>>>rdf <- nrow(ept)-mdf
>>>rdev/rdf #9.7 >>1
>>
>>So I conclude my model is overdispersed. The recent consensus solution seems
>>to be to create and add a individual level random variable to the model.
>>
>>ept$obs <- 1:nrow(ept) #create individual level random variable 1:72
>>modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
>>data=ept, family=binomial(link="logit"))
>>
>>I take a look at the residuals which are now much smaller but are... just...
>>too... good... for my ecological (glmm free) experience to be comfortable
>>with. Additionally, they fit better for intermediate data, which, with
>>binomial errors is the opposite of what I would expect. Feel free to inspect
>>them in the attached image (if attachments work via mail list... if not, I
>>can send it directly to whomever is interested).
>>
>>Because it looks too good... I test overdispersion again for the new model:
>>
>>rdev/rdf #0.37
>>
>>Which is terrifically underdispersed, for which the consensus is to ignore
>>it (Zuur et al. 2009).
>>
>>So, for my questions:
>>1. Is there anything relevant to add to/adjust in my approach thus far?
>>2. Is overdispersion an issue I should be concerned with for binomial
>>errors? Most sources think so, but I did find a post from Jerrod Hadfield
>>back in august where he states that overdispersion does not exist with a
>>binary response variable:
>>http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in
>>subsequent posts he recommends the approach I have taken by using an
>>individual level random variable).
>>3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F
>>tests instead of Z or X^2 tests to get p values because they "account for
>>the uncertainty in the estimates of overdispersion." That seems like a nice
>>simple option, I have not seen this come up in any other readings. Thoughts?
>>
>>
>>
>>
>>Here are the glmer model outputs:
>>
>>ModelB
>>Generalized linear mixed model fit by the Laplace approximation
>>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip)
>> Data: ept
>> AIC BIC logLik deviance
>> 754.3 777 -367.2 734.3
>>Random effects:
>> Groups Name Variance Std.Dev.
>> stream:rip (Intercept) 0.48908 0.69934
>> stream (Intercept) 0.18187 0.42647
>>Number of obs: 72, groups: stream:rip, 24; stream, 12
>>
>>Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>>(Intercept) -4.28529 0.50575 -8.473 < 2e-16 ***
>>wshd -2.06605 0.77357 -2.671 0.00757 **
>>wshf 3.36248 0.65118 5.164 2.42e-07 ***
>>wshg 3.30175 0.76962 4.290 1.79e-05 ***
>>ripN 0.07063 0.61930 0.114 0.90920
>>wshd:ripN 0.60510 0.94778 0.638 0.52319
>>wshf:ripN -0.80043 0.79416 -1.008 0.31350
>>wshg:ripN -2.78964 0.94336 -2.957 0.00311 **
>>
>>ModelBQ
>>
>>Generalized linear mixed model fit by the Laplace approximation
>>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs)
>> Data: ept
>> AIC BIC logLik deviance
>> 284.4 309.5 -131.2 262.4
>>Random effects:
>> Groups Name Variance Std.Dev.
>> obs (Intercept) 0.30186 0.54942
>> stream:rip (Intercept) 0.40229 0.63427
>> stream (Intercept) 0.12788 0.35760
>>Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>>
>>Fixed effects:
>> Estimate Std. Error z value Pr(>|z|)
>>(Intercept) -4.2906 0.4935 -8.694 < 2e-16 ***
>>wshd -2.0557 0.7601 -2.705 0.00684 **
>>wshf 3.3575 0.6339 5.297 1.18e-07 ***
>>wshg 3.3923 0.7486 4.531 5.86e-06 ***
>>ripN 0.1425 0.6323 0.225 0.82165
>>wshd:ripN 0.3708 0.9682 0.383 0.70170
>>wshf:ripN -0.8665 0.8087 -1.071 0.28400
>>wshg:ripN -3.1530 0.9601 -3.284 0.00102 **
>>
>>
>>Cheers,
>>--
>>Colin Wahl
>>Department of Biology
>>Western Washington University
>>Bellingham WA, 98225
>>ph: 360-391-9881
>
>
>
>--
>The University of Edinburgh is a charitable body, registered in
>Scotland, with registration number SC005336.
>
>_______________________________________________
>R-sig-mixed-models at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd. URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239 Fax: 757-467-2947
"Vere scire est per causas scire"
More information about the R-sig-mixed-models
mailing list