[R-sig-ME] overdispersion with binomial data?
Robert A LaBudde
ral at lcfltd.com
Sat Feb 12 04:07:08 CET 2011
1. Typically underdispersion is not considered as serious as
overdispersion because there are fewer natural explanations for it.
(negative correlation)
2. You have a causal explanation for underdispersion, because you
forced an additional term into the model. This can be an
over-correction. So underdispersion is just as important as
overdispersion a posteriori. (Similar to doing a difference on an
AR(1) time series.)
3. Your observations indicate your fix is overkill.
4. Using a t instead of a z test or confidence interval is
conservative, but won't make up for your serious overdispersion. (
3.1 = sqrt(9.7))
5. A simple fix for overdispersion is to multiply confidence limits
by the 3.1 factor here.
6. A better method might be to fit using a quasibinomial or a
negative binomial or a beta-binomial.
7. The best method is to examine your data to find out where the
clustering occurs and find a causal explanation for it. Then adjust
your model to account for the extra-binomial variation.
At 08:58 PM 2/11/2011, Colin Wahl wrote:
>In anticipation of the weekend:
>In my various readings(crawley, zuur, bolker's ecological models book, and
>the GLMM_TREE article, reworked supplementary material and R help posts) the
>discussion of overdispersion for glmm is quite convoluted by different
>interpretations, different ways to test for it, and different solutions to
>deal with it. In many cases differences seem to stem from the type of data
>being analyzed (e.g. binomial vs. poisson) and somewhat subjective options
>for which type of residuals to use for which models.
>
>The most consistent definition I have found is overdispersion is defined by
>a ratio of residual scaled deviance to the residual degrees of freedom > 1.
>
>Which seems simple enough.
> > modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
>family=binomial(link="logit"))
> > rdev <- sum(residuals(modelBQ)^2)
> > mdf <- length(fixef(modelBQ))
> > rdf <- nrow(ept)-mdf
> > rdev/rdf #9.7 >>1
>
>So I conclude my model is overdispersed. The recent consensus solution seems
>to be to create and add a individual level random variable to the model.
>
>ept$obs <- 1:nrow(ept) #create individual level random variable 1:72
>modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
>data=ept, family=binomial(link="logit"))
>
>I take a look at the residuals which are now much smaller but are... just...
>too... good... for my ecological (glmm free) experience to be comfortable
>with. Additionally, they fit better for intermediate data, which, with
>binomial errors is the opposite of what I would expect. Feel free to inspect
>them in the attached image (if attachments work via mail list... if not, I
>can send it directly to whomever is interested).
>
>Because it looks too good... I test overdispersion again for the new model:
>
>rdev/rdf #0.37
>
>Which is terrifically underdispersed, for which the consensus is to ignore
>it (Zuur et al. 2009).
>
>So, for my questions:
>1. Is there anything relevant to add to/adjust in my approach thus far?
>2. Is overdispersion an issue I should be concerned with for binomial
>errors? Most sources think so, but I did find a post from Jerrod Hadfield
>back in august where he states that overdispersion does not exist with a
>binary response variable:
>http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in
>subsequent posts he recommends the approach I have taken by using an
>individual level random variable).
>3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F
>tests instead of Z or X^2 tests to get p values because they "account for
>the uncertainty in the estimates of overdispersion." That seems like a nice
>simple option, I have not seen this come up in any other readings. Thoughts?
>
>
>
>
>Here are the glmer model outputs:
>
>ModelB
>Generalized linear mixed model fit by the Laplace approximation
>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip)
> Data: ept
> AIC BIC logLik deviance
> 754.3 777 -367.2 734.3
>Random effects:
> Groups Name Variance Std.Dev.
> stream:rip (Intercept) 0.48908 0.69934
> stream (Intercept) 0.18187 0.42647
>Number of obs: 72, groups: stream:rip, 24; stream, 12
>
>Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
>(Intercept) -4.28529 0.50575 -8.473 < 2e-16 ***
>wshd -2.06605 0.77357 -2.671 0.00757 **
>wshf 3.36248 0.65118 5.164 2.42e-07 ***
>wshg 3.30175 0.76962 4.290 1.79e-05 ***
>ripN 0.07063 0.61930 0.114 0.90920
>wshd:ripN 0.60510 0.94778 0.638 0.52319
>wshf:ripN -0.80043 0.79416 -1.008 0.31350
>wshg:ripN -2.78964 0.94336 -2.957 0.00311 **
>
>ModelBQ
>
>Generalized linear mixed model fit by the Laplace approximation
>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs)
> Data: ept
> AIC BIC logLik deviance
> 284.4 309.5 -131.2 262.4
>Random effects:
> Groups Name Variance Std.Dev.
> obs (Intercept) 0.30186 0.54942
> stream:rip (Intercept) 0.40229 0.63427
> stream (Intercept) 0.12788 0.35760
>Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>
>Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
>(Intercept) -4.2906 0.4935 -8.694 < 2e-16 ***
>wshd -2.0557 0.7601 -2.705 0.00684 **
>wshf 3.3575 0.6339 5.297 1.18e-07 ***
>wshg 3.3923 0.7486 4.531 5.86e-06 ***
>ripN 0.1425 0.6323 0.225 0.82165
>wshd:ripN 0.3708 0.9682 0.383 0.70170
>wshf:ripN -0.8665 0.8087 -1.071 0.28400
>wshg:ripN -3.1530 0.9601 -3.284 0.00102 **
>
>
>Cheers,
>--
>Colin Wahl
>Department of Biology
>Western Washington University
>Bellingham WA, 98225
>ph: 360-391-9881
>
>Content-type: image/png; name=ModelComp2.png
>Content-disposition: attachment; filename=ModelComp2.png
>X-Attachment-Id: f_gk1uvic00
>
>
>_______________________________________________
>R-sig-mixed-models at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd. URL: http://lcfltd.com/
824 Timberlake Drive Tel: 757-467-0954
Virginia Beach, VA 23464-3239 Fax: 757-467-2947
"Vere scire est per causas scire"
More information about the R-sig-mixed-models
mailing list