[R-sig-ME] Quantifying uncertainty for predictions from lme4 (predict.merMod)

Thu Aug 1 05:53:14 CEST 2013

Tommaso Jucker <tommasojucker at ...> writes:

> 
> Dear all,
> 
> I have the following mixed effects model with two crossed random effects
> which I am using to model tree growth:
> 
> Model <- lmer (log(Growth) ~ log(Size) + log(Competition) + (1 + log(Size)
> | Treatment) + (1 | Plot))
> 
> I am using this model to predict new data, and was glad to see that the
> development version of lme4 now comes with a predict function that allows
> both fixed and random effects to be used to generate predictions. 
> However I
> also need to be able to estimate the uncertainty in the predictions I am
> making, which is a problem since predict () in lme4 doesn't generate SE for
> predictions.

  The reason this feature is missing is that it's difficult, without
doing full-on MCMC or parametric bootstrapping, to generate standard errors
that appropriately incorporate all sources of error (in particular
the uncertainty in the random-effects parameters). Furthermore, which
components of the error are included depends in a complicated way on
which components of the model are conditioned on in generating the
prediction.  For example, if predictions are generated at the population
level, setting random effects to zero, then (maybe?) the random effects
variance should be incorporated in the prediction error.  Getting all
of this right is quite tricky.

> I have tried to alternative approached. The first was to use simulate() to
> generate a distribution of predicted values which I could then summarize
> into uncertainty estimates. However, I found that the output of simulate()
> is clearly different from that of predict(), regardless of how I treated
> the use.u argument relating to random effects. When I take the mean
> predicted value across 1000 or more simulations and compare it to the
> output of predict(), it is clear that the two methods are producing
> different results.

  Congratulations, you found a big bug in use.u=TRUE. If you reinstall
from github and use use.u=TRUE, you can get results whose mean agrees
with predict().  If you use use.u=FALSE and take the standard deviations
you can get the uncertainty of population-level predictions due to
random effects and residual error.  This approach also generalizes to
GLMMs.

  This approach will get you predictions that incorporate uncertainty
due *only* to stochasticity (not to uncertainty), which is probably not what
you want.

  Another approach is given in http://glmm.wikidot.com/faq ...
this gets variation due to the fixed-effect uncertainty only --
although for LMMs you can add residual error fairly easily.

> The second approach was to use the bootMer function as recommended in the
> help file for predict(). From this I was able to obtain SE for the
> parameter estimates. However, I'm not quite sure how to then translate
> these into uncertainty in the predictions (i.e., how do I obtain SEs for
> the predicted values?). Am I missing something obvious?

  You need to vary the FUN argument.

  If you pass FUN=predict, you should get results that incorporate
uncertainty in all model components.

  If you pass FUN=simulate, you should get prediction errors ...

  Ben Bolker