[R-sig-ME] subject level predictions with lme4 from incomplete longitudinal profile

Wed Nov 5 15:09:30 CET 2014

Dear Ben,
Thank you for the kind response. I realized it might not be easy, but does the data balance simplify things in any way. That is if all subjects used to fit model z1 would have the same number of observations taken for the exact same values of the predictors?
Regards,
Adi 

-----Original Message-----
From: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Bolker
Sent: Monday, November 03, 2014 5:48 PM
To: r-sig-mixed-models at r-project.org
Subject: Re: [R-sig-ME] subject level predictions with lme4 from incomplete longitudinal profile

Tarca, Adi <atarca at ...> writes:

> 
> Dear all,

> I am interested in using lme4 to make subject level predictions from 
> longitudinal data. I have 7 longitudinal observations for over 100 
> subjects to fit the model (Call it z1), and the goal would be to use 
> info from the first 2 observations of a new subject to make 
> predictions for the remaining 5 time points. One way seems to be to 
> add the first 2 time points of the new subject to the dataset of the 
> other subjects with full longitudinal sets and refit the model to get 
> the required random coefficients for the new subject in order to make 
> predictions. My question is whether refitting the model could be 
> avoided and use info from fitted model z1 as well as the design matrix 
> and response values for the first 2 time points of the new subject to 
> compute its random coefficients so that subject leve l predictions can 
> be obtained.

  This is an interesting (= not completely easy!) question.

  In theory, you should be able to work out the conditional mode of a new value, conditional on the estimated random effects variance-cov.
matrix for the model.

  For a scalar random effect (e.g. an intercept-only random effect), this would just be a variance-weighted average, e.g. if you have information from a new data set with mean xbar_i and variance v_i, and the overall model mean is Xbar with an among-groups variance of V, then your predicted value should be  

(xbar_i/v_i+Xbar/V)/(1/v_i+1/V)

Off the top of my head, I'm not sure how this would work for a multivariate random effect (i.e., random-slopes model).  In principle, even from two data points you can get an estimate of the intercept and slope and their variance-covariance matrix (although the variance-covariance matrix will be very poorly estimated -- I'm not sure what to do about the fact that the estimated of sigma-squared, RSS/(n-2), is NA ... do you just use RSS/n instead??)

  Once (if??) you get this information for the new observations, you have to figure out how to do the multivariate analogue of the scalar variance-weighted average shown above.

   Further ideas welcome ...

> See below an illustration using a subset of the entire dataset:
> 
> #get file http://bioinformaticsprb.med.wayne.edu/shares/uspar.zip and 
> unzip it
> > load("uspar.RData")

## snip

> >
> > library(lme4)
> > library(lattice)
> > print(xyplot(Y ~ t | ID, dat, aspect = "xy",
> +              layout = c(4,4), type = c("g", "p", "r"),
> +              xlab = "t",
> +              ylab = "Y"))
> >
> > #so straight lines are not good enough, therefore will try some
transformations of the time t
> > dat$X=log(dat$t)
> > dat$X2=dat$t*log(dat$t)
> >
> > #fit a mixed model without subject 14
> > z1=lmer(Y~X+X2+(1+X|ID),data=dat[1:87,])
> > coef(z1)

## snip

> > #Question:
> > #Say I only had the first 2 time points for subject 14 and wanted to 
> > make #subject level predictions for it for all time points.
> > #One way seems to be to fit a new model z2 with by including the 
> > first 2
points
> > # of this subject.
> >
> > z2=lmer(Y~X+X2+(1+X|ID),data=dat[1:89,])
> > coef(z2)
> $ID
>    (Intercept)        X           X2
> 1  -0.52677880 1.324643 -0.004759445
> 2  -0.46715932 1.296103 -0.004759445
> 3  -0.93644456 1.423830 -0.004759445
> 4  -0.95853280 1.415826 -0.004759445
> 5  -0.17203704 1.221526 -0.004759445
> 6  -0.24932519 1.241990 -0.004759445
> 7  -0.42432616 1.290164 -0.004759445
> 8  -0.15979241 1.234259 -0.004759445
> 9  -0.55790691 1.325007 -0.004759445
> 10 -0.03503148 1.197623 -0.004759445
> 11 -0.87957274 1.418050 -0.004759445
> 12 -0.97501533 1.424990 -0.004759445
> 13 -0.17154433 1.211794 -0.004759445
> 14 -0.65875042 1.348014 -0.004759445
> 
> attr(,"class")
> [1] "coef.mer"
> >
> > dat$Ypred=predict(z2,dat)
> > print(xyplot(Ypred ~ Y | ID, dat, aspect = "xy",
> +              layout = c(4,4), type = c("g", "p", "r"),
> +              xlab = "Y",
> +              ylab = "Ypred"))
> >
> > #Q1, Is there another way that would not require fitting model z2 
> > but
simply use
> > # info form z1 and
> > dat[88:89,c("X","X2","Y")]
>           X       X2        Y
> 88 2.706981 40.56130 2.803360
> 89 2.843977 48.87079 2.933857
> 
> Thanks,
> Adi Tarca
>

_______________________________________________
R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models