[R-sig-ME] Prediction interval for the difference of a pair of outcomes
Theodore Lytras
thlytras at gmail.com
Tue Mar 14 11:02:53 CET 2017
Dear all,
I'll try to keep this as brief as possible. Some time ago I asked about joint
(linear mixed-effects) modelling of two correlated outcomes, of the form:
m1 <- lmer(Y1 ~ age + X + (age | id), data=dat)
m2 <- lmer(Y2 ~ age + X + (age | id), data=dat)
Thierry Onkelinx very helpfully suggested the following:
library(tidyr)
long <- gather(dat, key = "trait", value = "Y", Y1, Y2)
lmer(Y ~ 0 + trait + trait * (age + X) + (0 + trait:age | id), data = long)
which was perfect and indeed worked as expected.
Now, I am trying to draw predictions (+prediction intervals) from this model.
I've read (among other stuff) the instructions on glmm.wikidot.com/faq, and
also this past message:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2013q3/020809.html
Thus I have two questions:
(1) If my interest is in predicting the response of a single unobserved
individual, accounting for all the random effects (=marginalizing over the
random effects, I think), which variances should I add together to construct
my prediction interval?
My understanding is that I would have to add (a) the variance due to the fixed
coefficients (betas), (b) the variance due to the random effects, AND (c) the
residual variance. Is this correct????
I've become rather confused because among the examples provided on the
glmm.wikidot.com/faq page, I see the lme and glmmADMB examples add (a) + (c),
ignoring the (b), while the lme4 example adds (a) + (b), ignoring the (c).
Maybe some extra detail would help a lot.
(2) Assuming that I do indeed need (a) + (b) + (c) to calculate prediction
intervals from my above described model, for one outcome (trait) only.
What if I want to calculate prediction intervals for the **difference**
between my two modelled outcomes, for a single unobserved individual?
Which variances and covariances should I account for??
Given that: Var(a*A + b*B) = a^2*Var(A) + b^2*Var(B) + 2*a*b*cov(A,B) ,
I can calculate (b), i.e. the variance due to the random effects, using the
correlations between the random effects.
Also I think I'll need to add **twice** the residual variance (c); is this
correct??
What I'm more in doubt about is the (a), i.e. the variance due to the fixed
effects. Don't I somehow have to account for the correlations between my fixed
effects terms, and if so, how should I go about doing that??
An extension of this latest question: say I move away from lme4 and run this
particular model in JAGS (in fact I did this already as an experiment),
calculating (b) and (c), and thus the limits of my 95% PI (for the difference
between the two outcomes), over the whole MCMC chain.
Since integrating over the MCMC chain automatically estimates the uncertainty
over any derived parameter (in this case, the difference between the
outcomes), shouldn't this account for any correlation between my fixed effects
terms as well? Thus problem solved??
Any help is greatly appreciated!
Thanks in advance,
Theodore Lytras
Epidemiologist, PhD student
Hellenic Centre for Disease Control and Prevention
More information about the R-sig-mixed-models
mailing list