[R-sig-ME] strategy to iterate over repeated measures/longitudinal data

Tue Jul 21 14:43:11 CEST 2009

Hi list,

I had posted this question to R-help, but I did not receive
any suggestions. I have rewritten my question, and I think it
may be more famililar to those who use lme4. Let me start with
a basic description.

Let's say we are interested in the regression of y on x1.
We could run lm(). Now let's say
y is measured multiple times on the same individual. This data is in
wide format and resembles
longitudinal data. lmer is used to take into account that the observations
y on the same individual are correlated.  But I'm still only interested in
the relationship between y and x1.

So I just convert from wide to long, run lmer, and extract the
coefficient I want.

Now let's say I have several xs: x1, x2, ...xn.  I want to know what is the
coefficient for the regression of y on each x separately. I now have to
iterate through the xs somehow.

Here is an example.

head(wide_data)

 id predictor1 predictor2 predictor3 measurement1 measurement2
1  1          a          a          b  -0.04493361  -0.05612874
2  2          a          a          a  -0.01619026  -0.15579551
3  3          b          b          b   0.94383621  -1.47075238
4  4          b          a          a   0.82122120  -0.47815006
5  5          a          b          a   0.59390132   0.41794156
6  6          b          a          a   0.91897737   1.35867955

The measurements are repeated measures, and I am looking at one
predictor at a time. In the actual problem, there are around 400,000
predictors.

Currently, I do the following.

For each predictor:
1. create a long data set using the predictor and all measurements
(using make.univ function from  multilevel package)
2. run lmer, extract the coefficient of interest
3. go to next predictor

The end result is a vector of 400,000 coefficients.

Do you have any suggestions on how I can improve this strategy?

Thanks for your help.

Juliet Hannah

Here is an example with inefficient, working code.

library(multilevel)
library(lme4)

#Same data as above
set.seed(1)
wide_data <- data.frame(
   id=c(1:10),
   predictor1 = sample(c("a","b"),10,replace=TRUE),
   predictor2 = sample(c("a","b"),10,replace=TRUE),
  predictor3 = sample(c("a","b"),10,replace=TRUE),
   measurement1=rnorm(10),
   measurement2=rnorm(10))

#vector of names to iterate over
predictor_names <- colnames(wide_data)[2:4]
#vector to store coefficients
mycoefs <- rep(-1,length(predictor_names))
names(mycoefs) <- predictor_names

for (predictor in predictor_names)
{
  long_data <-  make.univ( data.frame(wide_data$id,wide_data[,predictor]),
   data.frame(
        wide_data$measurement1,
        wide_data$measurement2
   )
 )
  names(long_data) <- c('id', 'predictor', 'time','measurement')
  myfit <- lmer(measurement ~ predictor + (1|id),data=long_data)
  mycoefs[predictor] <- myfit at fixef[2]
}

mycoefs