[R-sig-ME] Internal validation of a mixed effects model selected by backwards elimination on multiple bootstrap samples

Wed Mar 16 16:41:39 CET 2011

Hello

I would like to perform an internal validation of a mixed effects model analogous to Frank Harrell's validate function in the rms-package. I have a pooled data set of 23 controlled experimental studies, in which a hallucinogenic drug was administered to about 250 healthy volunteers on about 400 seperate occasions. Some subjects took the drug only once, while others took it up to 6 times. To account for the clustering in my data, I'm using a mixed effects model in which the intercept is allowed to vary per subject. The goal of my research is to predict the acute drug response by various pharmacological (i.d. drug dose) and non-pharmacological variables (pre-experience, personality, environment, sex, age etc.). The full model contains 23 potentially important predictor variables. I have reduced the full model by backwards elimination in combination with a two-step bootstrap procedure to increase the stability of the selected model. I'm fully aware that backward elimination is dangerous because it can lead to severely over-optimistic models. To estimate the overoptimism of my model, I would like to do the following.
1. Draw a bootstrap sample with replacement
2. Apply backward elimination in the bootstrap sample.
3. Estimate the performance of the selected model in the bootstrap sample by a performance measure, such as Edwards R^2 (formula 19 in Edwards et al. 2008).
4. Estimate the performance of the selected model when applied to the original sample.
5. Calculate the difference between 3 and 4.
6. Repeat several hundred times.

The mean difference between 3 and 4 is an estimate of the over-optimism.

Do you think this approach is reasonable? What performance measure would you use? I have managed to calculate Edwards R^2 by applying the following function (I hope the implementation is correct):

r2lmer <- function(model) {
   require(aod) # need the aod package for wald.test function
   if (class(model) != "mer") stop("mer object expected")
   n <- model at dims[['n']] # the number of observations
   p <- model at dims[['p']] # number of parameters
   df.mod <- n - p

   wald.model <- wald.test(b = fixef(model), Sigma = vcov(model), df = df.mod, Terms = 2:p)
   wald.F <- as.numeric(wald.model$result$Ftest[1])
   ( (p - 1) * 1 / df.mod * wald.F ) / (1 + (p - 1 ) * 1 / df.mod * wald.F )
}

The problem is, I don't know how to calculate Edwards R^2 when a model is applied to a new data set. Does anybody know? I guess there could be a problem with the random effects because not all subjects of the original sample are contained in the bootstrap sample. Thanks a lot!

Erich