[R-sig-ME] overdispersion with binomial data?
Jarrod Hadfield
j.hadfield at ed.ac.uk
Sat Feb 12 14:00:29 CET 2011
Hi Colin,
I have little to add over what John Maindonald said, but I see your
second question regarding my suggestions for binary/binomial data was
not answered. In most studies I think binomial data will be
over-dispersed and adding an observation-level random effect can be a
good way of modeling this. You can think of the n trials of a
binomial observation as a group of n correlated binary variables. The
variance associated with the observation-level term essentially
estimates how strong this correlation is (after accounting for other
fixed/random effects in the model). If the original data are already
binary then n=1 and there can be no correlation, and so
over-dispersion with binary data cannot exist.
Cheers,
Jarrod
Quoting Colin Wahl <biowahl at gmail.com>:
> In anticipation of the weekend:
> In my various readings(crawley, zuur, bolker's ecological models book, and
> the GLMM_TREE article, reworked supplementary material and R help posts) the
> discussion of overdispersion for glmm is quite convoluted by different
> interpretations, different ways to test for it, and different solutions to
> deal with it. In many cases differences seem to stem from the type of data
> being analyzed (e.g. binomial vs. poisson) and somewhat subjective options
> for which type of residuals to use for which models.
>
> The most consistent definition I have found is overdispersion is defined by
> a ratio of residual scaled deviance to the residual degrees of freedom > 1.
>
> Which seems simple enough.
>> modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
> family=binomial(link="logit"))
>> rdev <- sum(residuals(modelBQ)^2)
>> mdf <- length(fixef(modelBQ))
>> rdf <- nrow(ept)-mdf
>> rdev/rdf #9.7 >>1
>
> So I conclude my model is overdispersed. The recent consensus solution seems
> to be to create and add a individual level random variable to the model.
>
> ept$obs <- 1:nrow(ept) #create individual level random variable 1:72
> modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
> data=ept, family=binomial(link="logit"))
>
> I take a look at the residuals which are now much smaller but are... just...
> too... good... for my ecological (glmm free) experience to be comfortable
> with. Additionally, they fit better for intermediate data, which, with
> binomial errors is the opposite of what I would expect. Feel free to inspect
> them in the attached image (if attachments work via mail list... if not, I
> can send it directly to whomever is interested).
>
> Because it looks too good... I test overdispersion again for the new model:
>
> rdev/rdf #0.37
>
> Which is terrifically underdispersed, for which the consensus is to ignore
> it (Zuur et al. 2009).
>
> So, for my questions:
> 1. Is there anything relevant to add to/adjust in my approach thus far?
> 2. Is overdispersion an issue I should be concerned with for binomial
> errors? Most sources think so, but I did find a post from Jerrod Hadfield
> back in august where he states that overdispersion does not exist with a
> binary response variable:
> http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in
> subsequent posts he recommends the approach I have taken by using an
> individual level random variable).
> 3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F
> tests instead of Z or X^2 tests to get p values because they "account for
> the uncertainty in the estimates of overdispersion." That seems like a nice
> simple option, I have not seen this come up in any other readings. Thoughts?
>
>
>
>
> Here are the glmer model outputs:
>
> ModelB
> Generalized linear mixed model fit by the Laplace approximation
> Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip)
> Data: ept
> AIC BIC logLik deviance
> 754.3 777 -367.2 734.3
> Random effects:
> Groups Name Variance Std.Dev.
> stream:rip (Intercept) 0.48908 0.69934
> stream (Intercept) 0.18187 0.42647
> Number of obs: 72, groups: stream:rip, 24; stream, 12
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -4.28529 0.50575 -8.473 < 2e-16 ***
> wshd -2.06605 0.77357 -2.671 0.00757 **
> wshf 3.36248 0.65118 5.164 2.42e-07 ***
> wshg 3.30175 0.76962 4.290 1.79e-05 ***
> ripN 0.07063 0.61930 0.114 0.90920
> wshd:ripN 0.60510 0.94778 0.638 0.52319
> wshf:ripN -0.80043 0.79416 -1.008 0.31350
> wshg:ripN -2.78964 0.94336 -2.957 0.00311 **
>
> ModelBQ
>
> Generalized linear mixed model fit by the Laplace approximation
> Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs)
> Data: ept
> AIC BIC logLik deviance
> 284.4 309.5 -131.2 262.4
> Random effects:
> Groups Name Variance Std.Dev.
> obs (Intercept) 0.30186 0.54942
> stream:rip (Intercept) 0.40229 0.63427
> stream (Intercept) 0.12788 0.35760
> Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>
> Fixed effects:
> Estimate Std. Error z value Pr(>|z|)
> (Intercept) -4.2906 0.4935 -8.694 < 2e-16 ***
> wshd -2.0557 0.7601 -2.705 0.00684 **
> wshf 3.3575 0.6339 5.297 1.18e-07 ***
> wshg 3.3923 0.7486 4.531 5.86e-06 ***
> ripN 0.1425 0.6323 0.225 0.82165
> wshd:ripN 0.3708 0.9682 0.383 0.70170
> wshf:ripN -0.8665 0.8087 -1.071 0.28400
> wshg:ripN -3.1530 0.9601 -3.284 0.00102 **
>
>
> Cheers,
> --
> Colin Wahl
> Department of Biology
> Western Washington University
> Bellingham WA, 98225
> ph: 360-391-9881
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the R-sig-mixed-models
mailing list