[R-sig-ME] MCMCglmm predictions with new data, marginalized with respect to random effects

Sun May 31 19:59:05 CEST 2015

Dear Alberto,
If you haven't hit on them already, these two earlier threads may also be
useful to you (both conversations where Jarrod, author of MCMCglmm, helped
me out):
http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/11709
http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/9424
Your case is in some respects simpler, since as I understand it your
outcome is Normal--though you have some other model elements which (I'm not
sure) may complicate things.
Hope that's useful,
Malcolm

> Date: Sat, 30 May 2015 21:17:52 +0000
> From: "Lenth, Russell V" <russell-lenth at uiowa.edu>
> To: Alberto Gallano <alberto.gc8 at gmail.com>
> Cc: "r-sig-mixed-models at r-project.org"
>         <r-sig-mixed-models at r-project.org>
> Subject: Re: [R-sig-ME] MCMCglmm predictions with new data,
>         marginalized with respect to random effects
>
> You can streamline the process you describe somewhat using the lsmeans
> package; For your example, it'd be something like:
>
> # -----------------------------------------
>   library(lsmeans)
>
>   fit1.rg <- ref.grid(fit1,
>     at = list(I2.species.mean = seq(5, 12, by = 0.05),
>       body.mass.species.mean = seq(1000, 5000, by = 100)),
>     data = incisor.dat)
>
>   # frequentist summary -averages and SEs based on sample of fixed effects
>   summary(fit1.rg)
>
>   # posterior sample of predictions at each point in the grid
>   #   for further analysis using the coda package
>   #   (this will be too much, but could be useful for a coarser grid)
>   fit1.mcmc <- as.mcmc(fit1.rg)
>
> #--------------------------------------------
>
> Note in the ref.grid call, it was not necessary to specify the value of
> I2.within.species or body.mass.within.species, because the mean will be
> used by default. See ?ref.grid, ?"lsmeans-package", and ?models for more
> information.
>
> This does not address the part about incorporating the latent temp
> effects. Perhaps some others' answers could provide insights.
> Russ
>
> PS - is the incisor.dat dataset publicly available somewhere?
>
> Russell V. Lenth  -  Professor Emeritus
> Department of Statistics and Actuarial Science
> The University of Iowa  -  Iowa City, IA 52242  USA
> Voice (319)335-0712 (Dept. office)  -  FAX (319)335-3017
>
>
>
> -----Original Message-----
>
> Date: Wed, 27 May 2015 21:29:41 -0400
> From: Alberto Gallano <alberto.gc8 at gmail.com>
> To: r-sig-mixed-models at r-project.org
> Subject: [R-sig-ME] MCMCglmm predictions with new data, marginalized
>         with respect to random effects
>
>
> I'm trying to calculate predictions for new data (with confidence
> intervals) that are marginalized with respect to the random effects. I've
> been able to get part of the way there thanks to the suggestions of Ben
> Bolker in this post:
>
>
> http://stackoverflow.com/questions/21057548/how-can-one-simulate-quantities-of-interest-from-the-posterior-density-in-mcmcgl
>
>
> I have two questions:
>
>
> 1) how can I marginalize with respect to the random effects when using new
> data?
>
> (In other words, how should the information in the "temp" variable below be
> incorporated into the "pred_frame" object?. Here, "temp" is a vector of
> length 212 (106 random effects for each random term in the model formula -
> I have 106 species in the dataset). Pages 46-47 of the course notes suggest
> that the variance components should be summed (sigma^2) and then added to
> the linear predictor to get the expectation of the outcome variable, but
> I'm not sure how to do this with new data. Do I need to add the random
> effects to the "pred_frame" object?)
>
>
> 2) how can I calculate confidence intervals (or standard errors) for the
> predictions?
>
> (ideally while marginalizing with respect to the random effects, but if
> that's not possible, just using the fixed effects).
>
>
>
> #--------------------------------------------------------------------
> prior1 <- list(
>     B = list(mu = rep(0, 5), V = diag(1e+08, 5)),
>     G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 1000),
>              G2 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 1000)),
>     R = list(V = 1, nu = 0.002)
> )
>
> set.seed(1234)
> fit1 <- MCMCglmm(
>     fixed = I1 ~ I2.species.mean + I2.within.species +
> body.mass.species.mean + body.mass.within.species,
>     random = ~ phylo + species,
>     rcov = ~ units,
>     data = incisor.dat,
>     family = "gaussian",
>     ginverse = list(phylo = inv_phylo_mat$Ainv),
>     prior = prior1,
>     pr = TRUE,
>     pl = TRUE,
>     nitt = 1.1e+7, thin = 2000, burnin = 5e+5,
>     verbose = FALSE
>     )
>
> # prediction data frame
> pred_frame <- expand.grid(
>     I2.species.mean = seq(5, 12, by = 0.05),
>     I2.within.species = mean(incisor.dat$I2.within.species),
>     body.mass.species.mean = seq(1000, 5000, by = 100),
>     body.mass.within.species = mean(incisor.dat$body.mass.within.species)
>     )
>
> # fixed effects design matrix
> X <- model.matrix(~ I2.species.mean + I2.within.species +
>                     body.mass.species.mean + body.mass.within.species,
>                     data = pred_frame)
>
> # multiply the fixed effects design matrix (X) by the chains of
> coefficients
> # stored in the Sol ("solution") component of the MCMCglmm object
> pred_frame$I1 <- X %*% t(fit1$Sol[, colnames(X)])
>
> # random effects design matrix (with intercept suppressed)
> Z <- model.matrix(~ phylo + taxon - 1, data = incisor.dat)
>
> # multiply the random effects design matrix (Z) by the $Liab component
> (posterior distributions of latent variables)
> # (note: set pl = TRUE when running the model to get it to save the latent
> effects)
> temp <- colMeans(fit1$Liab %*% Z)
> sigma2 <- sum(temp)
> #--------------------------------------------------------------------
>
>
> best,
> Alberto
>
>

	[[alternative HTML version deleted]]