[R-sig-ME] Model assessment strategy for binomial mixed model
Adam Smith
raptorbio at hotmail.com
Thu Oct 27 18:24:52 CEST 2011
All,
A couple of weeks ago, I solicited assistance with a binomial mixed model with nested random effects. After much research and guidance from this list, I've formulated an analysis plan that I'd like to run by anyone who would graciously take time to evaluate the process.
The data structure:
> str(abs2009)
'data.frame': 416 obs. of 8 variables:
$ obs : Factor w/ 416 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ plot : Factor w/ 16 levels "N_10","N_2","N_3",..: 1 1 1 1 1 1 1 1 1 1 ...
$ geog : Factor w/ 2 levels "N","S": 1 1 1 1 1 1 1 1 1 1 ...
$ trt : Factor w/ 2 levels "CONT","TRT": 1 1 1 1 1 1 1 1 1 1 ...
$ count: Factor w/ 14 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
$ total: int 143 139 137 135 131 127 123 115 112 109 ...
$ abs : int 4 2 2 4 4 4 8 3 3 18 ...
$ dT : int 4 3 3 3 3 3 3 3 3 3 ...
The number of fruit abscised (abs) relative to the total number of fruit (total) is to be evaluated as a function of treatment (trt), geography (geog), and over time (count); abscission is highly stochastic, thus the use of a factor for the count variable. dT is the length, in days, of the count period during which abscission occurred. There are 16 plots, each composed of two subplots receiving a treatment or control at random.
Step 1: Evaluate the need to account for overdispersion using individual level random effects.
I used the full model with all 2-way interactions and a LRT for the overdispersion parameter:
> mod1 <- glmer (cbind(abs, total-abs) ~ trt + geog + count + trt:geog + trt:count + geog:count + (1|plot) + (1|trt:plot), family = binomial(link="cloglog"), data=abs2009)
> mod2 <- glmer (cbind(abs, total-abs) ~ trt + geog + count + trt:geog + trt:count + geog:count + (1|plot) + (1|trt:plot) + (1|obs), family = binomial(link="cloglog"), data=abs2009)
> anova(mod1,mod2) #overdispersion present
Data: abs2009
Models:
mod1: cbind(abs, total - abs) ~ trt + geog + count + trt:geog + trt:count + geog:count + (1 | plot) + (1 | trt:plot)
mod2: cbind(abs, total - abs) ~ trt + geog + count + trt:geog + trt:count + geog:count + (1 | plot) + (1 | trt:plot) + (1 | obs)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
mod1 45 1832.7 2014.1 -871.35
mod2 46 1136.4 1321.8 -522.21 698.29 1 < 2.2e-16 ***
Step 2: Determine whether length of count period needs to be accounted for.
I chose to plot mod2 residuals as a function of dT:
> boxplot(mod2 at resid~abs2009$dT) # Plot shows no effect of dT (plot at tinyurl.com/dT-vs-resids)
Step 3: Evaluate 2-way interactions:
I used LRT, but if the p-value was marginal, say <= 0.10, I would use parametric bootstrapping. Sound reasonable?
> drop1(mod2, test="Chisq") # None improve the model significantly; dropping all from model
Single term deletions
Model:
cbind(abs, total - abs) ~ trt + geog + count + trt:geog + trt:count + geog:count + (1 | plot) + (1 | trt:plot) + (1 | obs)
Df AIC LRT Pr(Chi)
<none> 1136.4
trt:geog 1 1136.5 2.0470 0.1525
trt:count 13 1119.0 8.6243 0.8007
geog:count 13 1129.1 18.6701 0.1337
Step 4: Evaluate main effects not involved in retained interaction.
I evaluated them similarly to the interactions, above...
> drop1(mod3, test="Chisq") # trt insignificant; test geog and count via parametric bootstrapping
Single term deletions
Model:
cbind(abs, total - abs) ~ trt + geog + count + (1 | plot) + (1 | trt:plot) + (1 | obs)
Df AIC LRT Pr(Chi)
<none> 1111.8
trt 1 1109.8 0.019 0.890959
geog 1 1117.2 7.454 0.006331 **
count 13 1199.1 113.343 < 2.2e-16 ***
Step 5: Bootstrap main effects of geog and count; not shown
Any suggestions to improve this process?
Thanks for considering,
Adam Smith
Dept. Natural
Resources Science
105 Coastal Institute in Kingston
University of Rhode Island
Kingston, RI 02881
Phone: 859-559-7172
Fax: 401-874-4561
More information about the R-sig-mixed-models
mailing list