[R-sig-ME] Prudent steps for overdispersion in glmer models (logit link)
Benjamin Dantzer
bendantzer at gmail.com
Thu Jul 9 19:42:04 CEST 2009
Dear Mixed Modelers,
I encounter much overdispersion (dispersion parameters >13) when
analyzing unbalanced proportion data and I'm trying to understand what
are prudent steps for ecologists to follow when performing GLMMs
(logit link) with overdispersion using lme4. I recognize other sources
of information about this topic and have read widely, but much of my
uncertainty comes from the current issue with lme4 and quasilikelihood
(quasibinomial in my case) that is discussed elsewhere (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q3/001404.html
) and (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q4/001632.html
)
I use the following behavioral data as an example. These behavioral
data are from 7 min focals where specific behaviors are recorded at 30
s intervals. In addition to multivariate approaches, I try to
determine how the proportions of specific behaviors vary across a
season or breeding attempts using GLMMs.
In the example below, I'm interested in how the proportion of time a
squirrel spends eating changes seasonally. A quadratic effect is
included for non-linearities. I first do an entirely fixed effects GLM
to look for overdispersion and then a GLMM with random effects for
both animal and observer (because repeated measures on animals and by
observers).
Mac OS X, R version 2.9.0, lme4 version 0.999375-31
GLM Example to assess overdispersion:
Call:
glm (formula = cbind (No.Nest, 15 - No.Nest) ~ poly (Day, 2), family
= binomial (link=logit), data = focals)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.326 -2.910 -2.664 3.189 6.967
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.08320 0.02004 -54.042 <2e-16 ***
poly(Day, 2)1 -8.36175 0.59299 -14.101 <2e-16 ***
poly(Day, 2)2 4.92777 0.58027 8.492 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 14476 on 902 degrees of freedom
Residual deviance: 14179 on 900 degrees of freedom
AIC: 14362
Number of Fisher Scoring iterations: 5
GLMER Example:
Because there are repeated samples on the same animals and potentially
observer effects, I include random effects for both animal and observer.
glmer (cbind (No.Nest, 15-No.Nest) ~ poly (Day,2) + (1|OBS) + (1|ID),
family=binomial (link = logit), focals, verbose=TRUE)
0: 12075.576: 0.607569 0.343693 -1.08320 -8.36175 4.92777
1: 11752.951: 1.49913 0.795727 -1.10953 -8.37088 4.92682
2: 11744.740: 1.47953 1.04743 -1.38218 -8.54319 4.89649
3: 11726.917: 1.88345 1.10659 -1.40701 -8.58032 4.89063
4: 11721.286: 2.01823 1.03093 -1.87186 -9.31820 4.75224
5: 11718.158: 2.41774 1.44964 -1.63616 -9.95216 4.64645
6: 11713.721: 2.26994 1.26734 -1.55596 -10.8047 4.52269
7: 11712.202: 2.22602 1.19583 -1.59063 -11.6781 4.70304
8: 11712.095: 2.20938 1.18793 -1.66349 -12.1621 4.17028
9: 11711.920: 2.20456 1.20150 -1.70144 -12.1439 4.55884
10: 11711.912: 2.20558 1.20779 -1.68434 -12.0797 4.51719
11: 11711.912: 2.20180 1.20734 -1.68745 -12.0847 4.51349
12: 11711.912: 2.20177 1.20903 -1.68700 -12.0926 4.51477
13: 11711.912: 2.20155 1.20892 -1.68680 -12.0894 4.51528
14: 11711.912: 2.20153 1.20893 -1.68678 -12.0893 4.51523
Generalized linear mixed model fit by the Laplace approximation
Formula: cbind(No.Nest, 15 - No.Nest) ~ poly(Day, 2) + (1 | OBS)
+ (1 | ID)
Data: focals.all.repro
AIC BIC logLik deviance
11722 11746 -5856 11712
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 4.8467 2.2015
OBS (Intercept) 1.4615 1.2089
Number of obs: 903, groups: ID, 125; OBS, 40
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.6868 0.3776 - 4.467 7.95e-06 ***
poly(Day , 2)1 -12.0893 1.0519 -11.492 < 2e-16 ***
poly(Day, 2)2 4.5139 0.8527 5.294 1.20e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Correlation of Fixed Effects:
(Intr) p(RD,2)1
ply(RpD,2)1 0.000
ply(RpD,2)2 -0.013 -0.027
Because of the current issue with quasi- and lme4 (see links above),
am I basically restricted to either 1) dropping random effects and
using quasibinomial with GLM, or 2) acknowledging the presence of
overdispersion but arguing that much of this is due to heterogeneity
across individuals and observers? In other examples I frequently get
std. devs. of random effects nearly as large as the estimates of the
fixed effects (as in the example in Bolker et al., 2008)? There are no
high leverage outlying observations for the fixed or random effects
and including additional covariates doesn't significantly decrease
dispersion parameter.
Looking forward to your opinions.
-Ben Dantzer
__________________________________
Ben Dantzer
PhD Candidate
Ecology, Evolutionary Biology, and Behavior Program
Department of Zoology
203 Natural Science Building
Michigan State University
East Lansing, MI 48824-115
Phone: 517-432-5555
Fax: 517-432-2789
Web: http://www.msu.edu/~dantzer
http://www.redsquirrel.msu.edu
More information about the R-sig-mixed-models
mailing list