[R-sig-ME] subject level predictions with lme4 from incomplete longitudinal profile

Mon Nov 3 23:47:54 CET 2014

Tarca, Adi <atarca at ...> writes:

> 
> Dear all,

> I am interested in using lme4 to make subject level predictions from
> longitudinal data. I have 7 longitudinal observations for over 100
> subjects to fit the model (Call it z1), and the goal would be to use
> info from the first 2 observations of a new subject to make
> predictions for the remaining 5 time points. One way seems to be to
> add the first 2 time points of the new subject to the dataset of the
> other subjects with full longitudinal sets and refit the model to
> get the required random coefficients for the new subject in order to
> make predictions. My question is whether refitting the model could
> be avoided and use info from fitted model z1 as well as the design
> matrix and response values for the first 2 time points of the new
> subject to compute its random coefficients so that subject leve l
> predictions can be obtained.

  This is an interesting (= not completely easy!) question.

  In theory, you should be able to work out the conditional mode
of a new value, conditional on the estimated random effects variance-cov.
matrix for the model.

  For a scalar random effect (e.g. an intercept-only random effect),
this would just be a variance-weighted average, e.g. if you have
information from a new data set with mean xbar_i and variance v_i,
and the overall model mean is Xbar with an among-groups variance of
V, then your predicted value should be  

(xbar_i/v_i+Xbar/V)/(1/v_i+1/V)

Off the top of my head, I'm not sure how this would work for
a multivariate random effect (i.e., random-slopes model).  In
principle, even from two data points you can get an estimate
of the intercept and slope and their variance-covariance matrix
(although the variance-covariance matrix will be very poorly
estimated -- I'm not sure what to do about the fact that the
estimated of sigma-squared, RSS/(n-2), is NA ... do you just
use RSS/n instead??)

  Once (if??) you get this information for the new observations,
you have to figure out how to do the multivariate analogue of
the scalar variance-weighted average shown above.

   Further ideas welcome ...

> See below an illustration using a subset of the entire dataset:
> 
> #get file http://bioinformaticsprb.med.wayne.edu/shares/uspar.zip and unzip it
> > load("uspar.RData")

## snip

> >
> > library(lme4)
> > library(lattice)
> > print(xyplot(Y ~ t | ID, dat, aspect = "xy",
> +              layout = c(4,4), type = c("g", "p", "r"),
> +              xlab = "t",
> +              ylab = "Y"))
> >
> > #so straight lines are not good enough, therefore will try some
transformations of the time t
> > dat$X=log(dat$t)
> > dat$X2=dat$t*log(dat$t)
> >
> > #fit a mixed model without subject 14
> > z1=lmer(Y~X+X2+(1+X|ID),data=dat[1:87,])
> > coef(z1)

## snip

> > #Question:
> > #Say I only had the first 2 time points for subject 14 and wanted to make
> > #subject level predictions for it for all time points.
> > #One way seems to be to fit a new model z2 with by including the first 2
points
> > # of this subject.
> >
> > z2=lmer(Y~X+X2+(1+X|ID),data=dat[1:89,])
> > coef(z2)
> $ID
>    (Intercept)        X           X2
> 1  -0.52677880 1.324643 -0.004759445
> 2  -0.46715932 1.296103 -0.004759445
> 3  -0.93644456 1.423830 -0.004759445
> 4  -0.95853280 1.415826 -0.004759445
> 5  -0.17203704 1.221526 -0.004759445
> 6  -0.24932519 1.241990 -0.004759445
> 7  -0.42432616 1.290164 -0.004759445
> 8  -0.15979241 1.234259 -0.004759445
> 9  -0.55790691 1.325007 -0.004759445
> 10 -0.03503148 1.197623 -0.004759445
> 11 -0.87957274 1.418050 -0.004759445
> 12 -0.97501533 1.424990 -0.004759445
> 13 -0.17154433 1.211794 -0.004759445
> 14 -0.65875042 1.348014 -0.004759445
> 
> attr(,"class")
> [1] "coef.mer"
> >
> > dat$Ypred=predict(z2,dat)
> > print(xyplot(Ypred ~ Y | ID, dat, aspect = "xy",
> +              layout = c(4,4), type = c("g", "p", "r"),
> +              xlab = "Y",
> +              ylab = "Ypred"))
> >
> > #Q1, Is there another way that would not require fitting model z2 but
simply use
> > # info form z1 and
> > dat[88:89,c("X","X2","Y")]
>           X       X2        Y
> 88 2.706981 40.56130 2.803360
> 89 2.843977 48.87079 2.933857
> 
> Thanks,
> Adi Tarca
>