[R] select observations from longitudinal data

Sun Mar 29 14:08:11 CEST 2009

Wacek Kusnierczyk wrote:
> gallon li wrote:
>> Suppose I have a long format for a longitudinal data
>>
>> id time x
>> 1 1 10
>> 1 2 11
>> 1 3 23
>> 1 4 23
>> 2 2 12
>> 2 3 13
>> 2 4 14
>> 3 1 11
>> 3 3 15
>> 3 4 18
>> 3 5 21
>> 4 2 22
>> 4 3 27
>> 4 6 29
>>
>> I want to select the x values for each ID when time is equal to 3. When that
>> observation is not observed, then I want to replace it with the obervation
>> at time equal to 4. otherwise just use NA.
>>   
> 
> with this dummy data:
> 
>     data = read.table(header=TRUE, textConnection(open='r', '
>         id time x      
>         2 2 2
>         2 3 3
>         2 4 4
>         2 5 5
>         3 3 3
>         3 4 4
>         3 5 5
>         4 4 4
>         4 5 5
>         5 5 5'))
> 
> you seem to expect the result to be like
> 
>     # id time x
>     # 2 3 3
>     # 3 3 3
>     # 4 4 4
>     # 5 NA NA
> 
> one way to hack this is:
> 
>     # the time points you'd like to use, in order of preference
>     times = 3:4
> 
>     # split the data by id,
>     # for each subset, find values of x for the first time found, or use NA
>     # combine the subsets back into a single data frame
>     do.call(rbind, by(data, data$id, function(data)
>         with(data, {
>             rows = (time == times[which(times %in% time)[1]])
>             if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
> data[rows,] })))
>     #   id time  x
>     # 2  2    3  3
>     # 3  3    3  3
>     # 4  4    4  4
>     # 5  5   NA NA
> 
> with your original data:
> 
>     data = read.table(header=TRUE, textConnection(open='r', '
>        id time x
>        1 1 10
>        1 2 11
>        1 3 23
>        1 4 23
>        2 2 12
>        2 3 13
>        2 4 14
>        3 1 11
>        3 3 15
>        3 4 18
>        3 5 21
>        4 2 22
>        4 3 27
>        4 6 29'))
>     times = 3:4
>     do.call(rbind, by(data, data$id, function(data)
>         with(data, {
>             rows = (time == times[which(times %in% time)[1]])
>             if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
> data[rows,] })))
> 
>     #   id time  x
>     # 1  1    3 23
>     # 2  2    3 13
>     # 3  3    3 15
>     # 4  4    3 27
> 
> is this what you wanted?

There's also the straightforward answer:

 > sapply(split(data,data$id), function(d) { r <- d$x[d$time==3]
+    if(!length(r)) r <- d$x[d$time==4]
+    if(!length(r)) NA
+    r})
  1  2  3  4
23 13 15 27

or, just to checkout the case where time==3 is actually missing:

 > sapply(split(data[-c(6,13),],data$id[-c(6,13)]), function(d) {
+    r <- d$x[d$time==3]
+    if(!length(r)) r <- d$x[d$time==4]
+    if(!length(r)) r <- NA
+    r})
  1  2  3  4
23 14 15 NA

-- 
    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907