[R-sig-ME] overdispersion with binomial data?

Sat Feb 12 04:07:08 CET 2011

1. Typically underdispersion is not considered as serious as 
overdispersion because there are fewer natural explanations for it. 
(negative correlation)

2. You have a causal explanation for underdispersion, because you 
forced an additional term into the model. This can be an 
over-correction. So underdispersion is just as important as 
overdispersion a posteriori. (Similar to doing a difference on an 
AR(1) time series.)

3. Your observations indicate your fix is overkill.

4. Using a t instead of a z test or confidence interval is 
conservative, but won't make up for your serious overdispersion. ( 
3.1 = sqrt(9.7))

5. A simple fix for overdispersion is to multiply confidence limits 
by the 3.1 factor here.

6. A better method might be to fit using a quasibinomial or a 
negative binomial or a beta-binomial.

7. The best method is to examine your data to find out where the 
clustering occurs and find a causal explanation for it. Then adjust 
your model to account for the extra-binomial variation.

At 08:58 PM 2/11/2011, Colin Wahl wrote:
>In anticipation of the weekend:
>In my various readings(crawley, zuur, bolker's ecological models book, and
>the GLMM_TREE article, reworked supplementary material and R help posts) the
>discussion of overdispersion for glmm is quite convoluted by different
>interpretations, different ways to test for it, and different solutions to
>deal with it. In many cases differences seem to stem from the type of data
>being analyzed (e.g. binomial vs. poisson) and somewhat subjective options
>for which type of residuals to use for which models.
>
>The most consistent definition I have found is overdispersion is defined by
>a ratio of residual scaled deviance to the residual degrees of freedom > 1.
>
>Which seems simple enough.
> > modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
>family=binomial(link="logit"))
> > rdev <- sum(residuals(modelBQ)^2)
> > mdf <- length(fixef(modelBQ))
> > rdf <- nrow(ept)-mdf
> > rdev/rdf #9.7 >>1
>
>So I conclude my model is overdispersed. The recent consensus solution seems
>to be to create and add a individual level random variable to the model.
>
>ept$obs <- 1:nrow(ept) #create individual level random variable 1:72
>modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
>data=ept, family=binomial(link="logit"))
>
>I take a look at the residuals which are now much smaller but are... just...
>too... good... for my ecological (glmm free) experience to be comfortable
>with. Additionally, they fit better for intermediate data, which, with
>binomial errors is the opposite of what I would expect. Feel free to inspect
>them in the attached image (if attachments work via mail list... if not, I
>can send it directly to whomever is interested).
>
>Because it looks too good... I test overdispersion again for the new model:
>
>rdev/rdf #0.37
>
>Which is terrifically underdispersed, for which the consensus is to ignore
>it (Zuur et al. 2009).
>
>So, for my questions:
>1. Is there anything relevant to add to/adjust in my approach thus far?
>2. Is overdispersion an issue I should be concerned with for binomial
>errors? Most sources think so, but I did find a post from Jerrod Hadfield
>back in august where he states that overdispersion does not exist with a
>binary response variable:
>http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in
>subsequent posts he recommends the approach I have taken by using an
>individual level random variable).
>3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F
>tests instead of Z or X^2 tests to get p values because they "account for
>the uncertainty in the estimates of overdispersion." That seems like a nice
>simple option, I have not seen this come up in any other readings. Thoughts?
>
>
>
>
>Here are the glmer model outputs:
>
>ModelB
>Generalized linear mixed model fit by the Laplace approximation
>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip)
>    Data: ept
>    AIC BIC logLik deviance
>  754.3 777 -367.2    734.3
>Random effects:
>  Groups     Name        Variance Std.Dev.
>  stream:rip (Intercept) 0.48908  0.69934
>  stream     (Intercept) 0.18187  0.42647
>Number of obs: 72, groups: stream:rip, 24; stream, 12
>
>Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
>(Intercept) -4.28529    0.50575  -8.473  < 2e-16 ***
>wshd        -2.06605    0.77357  -2.671  0.00757 **
>wshf         3.36248    0.65118   5.164 2.42e-07 ***
>wshg         3.30175    0.76962   4.290 1.79e-05 ***
>ripN         0.07063    0.61930   0.114  0.90920
>wshd:ripN    0.60510    0.94778   0.638  0.52319
>wshf:ripN   -0.80043    0.79416  -1.008  0.31350
>wshg:ripN   -2.78964    0.94336  -2.957  0.00311 **
>
>ModelBQ
>
>Generalized linear mixed model fit by the Laplace approximation
>Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs)
>    Data: ept
>    AIC   BIC logLik deviance
>  284.4 309.5 -131.2    262.4
>Random effects:
>  Groups     Name        Variance Std.Dev.
>  obs        (Intercept) 0.30186  0.54942
>  stream:rip (Intercept) 0.40229  0.63427
>  stream     (Intercept) 0.12788  0.35760
>Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>
>Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
>(Intercept)  -4.2906     0.4935  -8.694  < 2e-16 ***
>wshd         -2.0557     0.7601  -2.705  0.00684 **
>wshf          3.3575     0.6339   5.297 1.18e-07 ***
>wshg          3.3923     0.7486   4.531 5.86e-06 ***
>ripN          0.1425     0.6323   0.225  0.82165
>wshd:ripN     0.3708     0.9682   0.383  0.70170
>wshf:ripN    -0.8665     0.8087  -1.071  0.28400
>wshg:ripN    -3.1530     0.9601  -3.284  0.00102 **
>
>
>Cheers,
>--
>Colin Wahl
>Department of Biology
>Western Washington University
>Bellingham WA, 98225
>ph: 360-391-9881
>
>Content-type: image/png; name=ModelComp2.png
>Content-disposition: attachment; filename=ModelComp2.png
>X-Attachment-Id: f_gk1uvic00
>
>
>_______________________________________________
>R-sig-mixed-models at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"