[R-sig-ME] predictions from a binomial GLMM

Fri Apr 13 17:57:29 CEST 2012

Malcolm Fairbrother <m.fairbrother at ...> writes:

> This issue has come up before, but after reading previous threads
> I'm still not quite clear on how to do this.  After fitting a
> binomial model, with "cbind(successes, failures)" as the outcome and
> a single covariate "x", how would one generate predicted
> probabilities for different values of x? Below is an example, with
> my attempt to do this, where I am interested primarily in plotting a
> 95% confidence interval for predictions for a new/hypothetical unit
> (i.e., one not included in the data and thus for which no random
> effect has been estimated).

>  I thought that using a parametric bootstrap would be appropriate
> and reasonably straightforward, allowing me to get predictions with
> something like: newdata %*% fixef(mod) .  However, from what I've
> seen (e.g., on Ben Bolker's wiki at http://glmm.wikidot.com/faq, and
> related threads such as
> https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q2/016356.html),
> something more is needed… though I'm not clear on whether that
> remains true: (a) if one uses a bootstrap, and (b) if the
> predictions are for new units (rather than ones for which the model
> has produced random effects).

  Hmm.  I can't see that you're missing anything obvious ...
the parametric bootstrap _should_ take care of the issue of
the uncertainty in the random-effects parameters (I think ...)

  I tweaked your code in a few places, but all the changes
are essentially cosmetic.  In this case, interestingly (?)
there doesn't seem to be much difference between the naive
(plug-in fixed effect variances only) CIs and the ones
computed in this way.

--------------
library(lme4.0)
library(parallel)  ## newer than multicore, otherwise not much difference
n <- 30
set.seed(101)      ## set RNG seed ...
dat <- data.frame(obs=1:n, x=runif(n, -1, 1))
dat <- with(dat, data.frame(dat, 
       y=rbinom(n, 10, prob=plogis(-1 + 2*x + rnorm(n)))))
(mod <- lmer(cbind(y,10-y) ~ x + (1 | obs), dat, family=binomial))
sims <- do.call("rbind", mclapply(1:200, 
     function(x) fixef(refit(mod, simulate(mod)))))
## parametric bootstrap
## define datapoints for which to calculate predictions
newdat <- cbind(1,seq(-1,1,0.1)) 
preds <- newdat %*% t(sims) # get predictions for each resample
eta_CI <- t(apply(preds, 1, function(x) quantile(x, c(0.025,0.975))))
CI <- plogis(eta_CI) # convert the lower bound to the data scale
colnames(CI) <- c("lwr","upr")
pred <- data.frame(x=newdat[,2],eta=newdat %*% fixef(mod),CI)
pred$m <- plogis(pred$eta)
library(ggplot2)
(q1 <- qplot(x,m,ymin=lwr,ymax=upr,data=pred,geom="line")+
    geom_ribbon(colour=NA,alpha=0.3)+theme_bw())

## compare naive prediction intervals:
eta_sd <- sqrt(diag(newdat %*% vcov(mod) %*% t(newdat)))
CI2 <- data.frame(x=pred$x,
      m=pred$m,plogis(pred$eta+t(1.96*outer(c(-1,1),eta_sd))))
names(CI2)[3:4] <- c("lwr","upr")

q1+geom_ribbon(data=CI2,colour=NA,alpha=0.3,fill="blue")