[R-sig-ME] Bootstrapping mixed effects models

Mon Aug 23 16:04:19 CEST 2010

Hi folks,

Visual data analysis is very important in my field (cognitive
science), and while I know that you can obtain confidence intervals
for the cells of a fixed-effects design (as described at
http://glmm.wikidot.com/faq ), and confidence intervals for each
effect/interaction via MCMC, these approaches (at least, as I
understand them) fail to completely satisfy me. I can provide further
details details on my (possibly naive) dissatisfaction if necessary,
but for now I would be grateful for feedback on a solution I've come
up with that lets me visualize any level of the data I choose.

The approach I take is to obtain the model predictions for each cell
of the fixed-effects design, then bootstrap distributions of
predictions for each cell. The data I typically encounter have only
one random effect (experiment participant), and many observations
within each participant and cell of the fixed-effects design, so on
each iteration of the bootstrap I resample participants then resample
observations independently within each individual in the new sample of
participants (as recommended by:
http://stats.stackexchange.com/questions/1399/obtaining-and-interpreting-bootstrapped-confidence-intervals-from-hierarchical-da).

This yields distributions of predicted values for each cell of the
fixed-effects design which can be used to generate CIs for each cell,
but also can be used to compute the CI for any effect/interaction. For
example, if I suspect (whether a priori or by looking at the t-values
from the original model) that there's an interaction between the
2-level and 4-level predictors, I can generate 2 useful graphs:
(1) collapse the 3-level predictor to a mean within each iteration and
plot the resulting set of 8 means and associated CIs.
(2) collapse the 3-level predictor to a mean within each iteration
*then* collapse the 2-level predictor to a difference score within
each iteration and plot the resulting set of 4 means and associated
CIs

If I have a numeric predictor in the model, I would obtain predictions
across a set of values across the range of this predictor (eg. seq(
min(nIV) , max(nIV) , length.out = 1e3 ) ).

One thing I like about this approach is that within each predictor I
don't have to specify an intercept level to which the other levels are
compared. Furthermore, where I typically deal with data that is
strongly positively skewed (human response times), I wonder if the
non-parametric-ness of bootstrapping actually improves inference
relative to looking simply at the t-values from the original model or
anova()-based sub-model comparisons, both of which assume gaussian
error as I understand it.

I'd appreciate feedback on the reasonability of this approach.

Cheers,

Mike

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~