[R-sig-ME] prediction intervals and lmer

Ben Bolker bbolker at gmail.com
Fri Aug 30 19:05:13 CEST 2013


Muldoon, Ariel <Ariel.Muldoon at ...> writes:

> 
> Today I was thinking about prediction intervals, and 
> happened to be look through the code for calculating
> prediction intervals for lmer objects from the 
> always handy http://glmm.wikidot.com/faq.  The way the
> variance is calculated doesn't make sense to me, so 
> I'm wondering if I'm missing something.
> 
> The code for calculating the variances from the wikidot site:
> fm1 <- lmer(
>     formula = distance ~ age*Sex + (age|Subject)
>     , data = Orthodont
> )
> newdat <- expand.grid(
>     age=c(8,10,12,14)
>     , Sex=c("Male","Female")
>     , distance = 0
> )
> mm <- model.matrix(terms(fm1),newdat)
> newdat$distance <- mm %*% fixef(fm1)
> pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
> tvar1 <- pvar1+VarCorr(fm1)$Subject[1]
> 
> It's this piece that I'm having trouble with:
> 
> VarCorr(fm1)$Subject[1].
> 
> This is the variance for the random intercept term.  
> If I had been doing this on my own, I would have used the
> residual variance (conditional variance of the response) 
> in building prediction intervals.  I can pull
> the residual variance out with
> 
> getME(fm1, "sigma")^2
> 
> or (uglier)
> 
> attr(VarCorr(fm1), "sc")^2.
> 
> Am I missing something fundamental about prediction 
> intervals and mixed models?
> 

  If you look carefully at the FAQ code, you'll see that the 
lme and glmmADMB code do include residual variance terms.  The page
itself is a little bit vague about which computed values are
confidence intervals are confidence intervals and which are prediction
intervals; the values given for lme4 are confidence intervals,
incorporating either (1) only the uncertainty on the fixed-effect
parameters (beta) or (2) uncertainty on beta plus variation due
to random effects [this would be a sort-of confidence interval for 
a population-level prediction, i.e. the expected variation
(conditional on the random-effect parameter effects) of the mean
of a large number of samples from a _single_ previously unobserved
block].   The problem (or at least, my problem) with setting up
generic confidence/prediction intervals for mixed models is thinking
clearly about which sources of variation one wants to (1) ignore,
(2) condition on, (3) marginalize over ...  (Also, note that none
of these approaches allows for the uncertainty of the random-effects
parameters.)



More information about the R-sig-mixed-models mailing list