[R-sig-ME] Prediction interval for the difference of a pair of outcomes

Tue Mar 14 11:02:53 CET 2017

Dear all,

I'll try to keep this as brief as possible. Some time ago I asked about joint 
(linear mixed-effects) modelling of two correlated outcomes, of the form:

m1 <- lmer(Y1 ~ age + X + (age | id), data=dat)
m2 <- lmer(Y2 ~ age + X + (age | id), data=dat)

Thierry Onkelinx very helpfully suggested the following:

library(tidyr)
long <- gather(dat, key = "trait", value = "Y", Y1, Y2)
lmer(Y ~ 0 + trait + trait * (age + X) + (0 + trait:age | id), data = long)

which was perfect and indeed worked as expected.

Now, I am trying to draw predictions (+prediction intervals) from this model. 
I've read (among other stuff) the instructions on glmm.wikidot.com/faq, and 
also this past message:

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020809.html

Thus I have two questions:

(1) If my interest is in predicting the response of a single unobserved 
individual, accounting for all the random effects (=marginalizing over the 
random effects, I think), which variances should I add together to construct 
my prediction interval? 

My understanding is that I would have to add (a) the variance due to the fixed 
coefficients (betas), (b) the variance due to the random effects, AND (c) the 
residual variance. Is this correct????

I've become rather confused because among the examples provided on the  
glmm.wikidot.com/faq page, I see the lme and glmmADMB examples add (a) + (c), 
ignoring the (b), while the lme4 example adds (a) + (b), ignoring the (c). 
Maybe some extra detail would help a lot.

(2) Assuming that I do indeed need (a) + (b) + (c) to calculate prediction 
intervals from my above described model, for one outcome (trait) only. 
What if I want to calculate prediction intervals for the **difference** 
between my two modelled outcomes, for a single unobserved individual? 
Which variances and covariances should I account for??

Given that: Var(a*A + b*B) = a^2*Var(A) + b^2*Var(B) + 2*a*b*cov(A,B) ,
I can calculate (b), i.e. the variance due to the random effects, using the 
correlations between the random effects. 
Also I think I'll need to add **twice** the residual variance (c); is this 
correct??
What I'm more in doubt about is the (a), i.e. the variance due to the fixed 
effects. Don't I somehow have to account for the correlations between my fixed 
effects terms, and if so, how should I go about doing that?? 

An extension of this latest question: say I move away from lme4 and run this 
particular model in JAGS (in fact I did this already as an experiment), 
calculating (b) and (c), and thus the limits of my 95% PI (for the difference 
between the two outcomes), over the whole MCMC chain.

Since integrating over the MCMC chain automatically estimates the uncertainty 
over any derived parameter (in this case, the difference between the 
outcomes), shouldn't this account for any correlation between my fixed effects 
terms as well? Thus problem solved??

Any help is greatly appreciated!

Thanks in advance,

Theodore Lytras

Epidemiologist, PhD student
Hellenic Centre for Disease Control and Prevention