[R-sig-ME] overdispersion with binomial data?

Sat Feb 12 14:00:29 CET 2011

Hi Colin,

I have little to add over what John Maindonald said, but I see your  
second question regarding my suggestions for binary/binomial data was  
not answered. In most studies I think binomial data will be  
over-dispersed and adding an observation-level random effect can be a  
good way of modeling this.  You can think of the n trials of a  
binomial observation as a group of n correlated binary variables. The  
variance associated with the observation-level term essentially  
estimates how strong this correlation is (after accounting for other  
fixed/random effects in the model). If the original data are already  
binary then n=1 and there can be no correlation, and so  
over-dispersion with binary data cannot exist.

Cheers,

Jarrod

Quoting Colin Wahl <biowahl at gmail.com>:

> In anticipation of the weekend:
> In my various readings(crawley, zuur, bolker's ecological models book, and
> the GLMM_TREE article, reworked supplementary material and R help posts) the
> discussion of overdispersion for glmm is quite convoluted by different
> interpretations, different ways to test for it, and different solutions to
> deal with it. In many cases differences seem to stem from the type of data
> being analyzed (e.g. binomial vs. poisson) and somewhat subjective options
> for which type of residuals to use for which models.
>
> The most consistent definition I have found is overdispersion is defined by
> a ratio of residual scaled deviance to the residual degrees of freedom > 1.
>
> Which seems simple enough.
>> modelB<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip), data=ept,
> family=binomial(link="logit"))
>> rdev <- sum(residuals(modelBQ)^2)
>> mdf <- length(fixef(modelBQ))
>> rdf <- nrow(ept)-mdf
>> rdev/rdf #9.7 >>1
>
> So I conclude my model is overdispersed. The recent consensus solution seems
> to be to create and add a individual level random variable to the model.
>
> ept$obs <- 1:nrow(ept) #create individual level random variable 1:72
> modelBQ<-glmer(E ~ wsh*rip + (1|stream) + (1|stream:rip) + (1|obs),
> data=ept, family=binomial(link="logit"))
>
> I take a look at the residuals which are now much smaller but are... just...
> too... good... for my ecological (glmm free) experience to be comfortable
> with. Additionally, they fit better for intermediate data, which, with
> binomial errors is the opposite of what I would expect. Feel free to inspect
> them in the attached image (if attachments work via mail list... if not, I
> can send it directly to whomever is interested).
>
> Because it looks too good... I test overdispersion again for the new model:
>
> rdev/rdf #0.37
>
> Which is terrifically underdispersed, for which the consensus is to ignore
> it (Zuur et al. 2009).
>
> So, for my questions:
> 1. Is there anything relevant to add to/adjust in my approach thus far?
> 2. Is overdispersion an issue I should be concerned with for binomial
> errors? Most sources think so, but I did find a post from Jerrod Hadfield
> back in august where he states that overdispersion does not exist with a
> binary response variable:
> http://web.archiveorange.com/archive/v/rOz2zS8BHYFloUr9F0Ut (though in
> subsequent posts he recommends the approach I have taken by using an
> individual level random variable).
> 3. Another approach (from Bolker's TREE_GLMM article) is to use Wald t or F
> tests instead of Z or X^2 tests to get p values because they "account for
> the uncertainty in the estimates of overdispersion." That seems like a nice
> simple option, I have not seen this come up in any other readings. Thoughts?
>
>
>
>
> Here are the glmer model outputs:
>
> ModelB
> Generalized linear mixed model fit by the Laplace approximation
> Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip)
>    Data: ept
>    AIC BIC logLik deviance
>  754.3 777 -367.2    734.3
> Random effects:
>  Groups     Name        Variance Std.Dev.
>  stream:rip (Intercept) 0.48908  0.69934
>  stream     (Intercept) 0.18187  0.42647
> Number of obs: 72, groups: stream:rip, 24; stream, 12
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept) -4.28529    0.50575  -8.473  < 2e-16 ***
> wshd        -2.06605    0.77357  -2.671  0.00757 **
> wshf         3.36248    0.65118   5.164 2.42e-07 ***
> wshg         3.30175    0.76962   4.290 1.79e-05 ***
> ripN         0.07063    0.61930   0.114  0.90920
> wshd:ripN    0.60510    0.94778   0.638  0.52319
> wshf:ripN   -0.80043    0.79416  -1.008  0.31350
> wshg:ripN   -2.78964    0.94336  -2.957  0.00311 **
>
> ModelBQ
>
> Generalized linear mixed model fit by the Laplace approximation
> Formula: E ~ wsh * rip + (1 | stream) + (1 | stream:rip) + (1 | obs)
>    Data: ept
>    AIC   BIC logLik deviance
>  284.4 309.5 -131.2    262.4
> Random effects:
>  Groups     Name        Variance Std.Dev.
>  obs        (Intercept) 0.30186  0.54942
>  stream:rip (Intercept) 0.40229  0.63427
>  stream     (Intercept) 0.12788  0.35760
> Number of obs: 72, groups: obs, 72; stream:rip, 24; stream, 12
>
> Fixed effects:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)  -4.2906     0.4935  -8.694  < 2e-16 ***
> wshd         -2.0557     0.7601  -2.705  0.00684 **
> wshf          3.3575     0.6339   5.297 1.18e-07 ***
> wshg          3.3923     0.7486   4.531 5.86e-06 ***
> ripN          0.1425     0.6323   0.225  0.82165
> wshd:ripN     0.3708     0.9682   0.383  0.70170
> wshf:ripN    -0.8665     0.8087  -1.071  0.28400
> wshg:ripN    -3.1530     0.9601  -3.284  0.00102 **
>
>
> Cheers,
> --
> Colin Wahl
> Department of Biology
> Western Washington University
> Bellingham WA, 98225
> ph: 360-391-9881
>

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.