[R-sig-ME] longitudinal study with baseline

Fri Sep 3 18:30:50 CEST 2010

Hi all, I just realized that actually it is Marc Schwartz who referred the 
baseline handling paper to me. I am sorry for the error. Both Marc and Dennis 
have been tremendous help in my previous discussion. I would appreciate very 
much if anyone can share their thoughts on this one. Also realized the data file 
"dat.txt" should have the column titled "glucose" as "y" instead to use my code 
below.

Thanks you all.

John

----- Original Message ----
From: array chip <arrayprofile at yahoo.com>
To: r-sig-mixed-models at r-project.org
Sent: Fri, September 3, 2010 12:26:24 AM
Subject: [R-sig-ME] longitudinal study with baseline

Hi all,

I asked some questions on how to analyze longitudinal study with only 2 time 
points (baseline and a follow-up) previously. I appreciate many useful comments 
from some members, especially Dennis Murphy who refered the following paper 
addressing specifically this type of study with only 2 time points: 
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1121605/

Basically, with only 2 time points (baseline and one follow-up), ANCOVA with 
follow-up as dependent variable and baseline as covariate should be used:

follow-up = a + b*baseline + treatment

Now I have a regular longitudinal study with 6 time points, 7 treatments 
(vehicle, A, B, C, D, F, G), measuring a response variable "y". The dataset is 
attached. I have some questions, and appreciate any suggestions on how to 
analyze the dataset.

dat<-read.table("dat.txt",sep='\t',header=T,row.names=NULL)
library(MASS)
dat$trt<-relevel(dat$trt,'vehicle')

xyplot(y~time, groups=trt, data=dat, 
ylim=c(3,10),col=c(1:6,8),lwd=2,type=c('g','a'),xlab='Days',ylab="response",
 key = list(lines=list(col=c(1:6,8),lty=1,lwd=2),
                  text = list(lab = levels(dat$trt)),
                  columns = 3, title = "Treatment"))

So as you can see that there is some curvature between glucose level and time, 
so a quadratic fit might be needed. 

dat$time2<-dat$time*dat$time

A straight fit like below seems reasonable:

fit<-lmer(y~trt*time+trt*time2+(time|id),dat)

Checking on random effects, it appears that variance component for random slope 
is very small, so a simpler model with random intercept only may be sufficient:

fit<-lmer(y~trt*time+trt*time2+(1|id),dat)

Now, I want to incorporate baseline response into the model in order to account 
for any baseline imbalance. I need to generate a new variable "baseline" based 
on glucose levels at time=0:

dat<-merge(dat, dat[dat$time==0,c('id','y')], by.x='id',by.y='id',all.x=T)
colnames(dat)[c(4,6)]<-c('y','baseline')

so the new fit adding baseline into the mixed model is:

fit<-lmer(y~baseline+trt*time+trt*time2+(1|id),dat)

Now my question is 1). Is the above model a reasonable thing to do? 2) when 
baseline is included as a covariate, should I remove the data points at baseline 

from the dataset? I am kind of unsure if it's reasonable to use the baseline 
both as a covariate and as part of the dependent variable values.

Next thing I want to do with this dataset is to do multiple comparisons between 
each treatment (A, B, C, D, F, G) vs. vehicle at a given time point, say time=56 

(the last time points) after adjusting the baseline imbalance. This seems to be 
done using Dunnet test. When I say "after adjusting baseline imbalance", I mean 
the comparisons should be done based on the difference between time=56 and 
time=0 (baseline). How can we test this? Will glht() in multcomp work for a lmer 

fit? If yes, how can I specify the syntax?

Finally, with the above model, is there anyway to estimate the difference (and 
the standard error) between time=56 and time=0 (baseline) for each treatment 
groups?

Thank you fall or your attention.

John