[R] select observations from longitudinal data
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Sun Mar 29 14:08:11 CEST 2009
Wacek Kusnierczyk wrote:
> gallon li wrote:
>> Suppose I have a long format for a longitudinal data
>>
>> id time x
>> 1 1 10
>> 1 2 11
>> 1 3 23
>> 1 4 23
>> 2 2 12
>> 2 3 13
>> 2 4 14
>> 3 1 11
>> 3 3 15
>> 3 4 18
>> 3 5 21
>> 4 2 22
>> 4 3 27
>> 4 6 29
>>
>> I want to select the x values for each ID when time is equal to 3. When that
>> observation is not observed, then I want to replace it with the obervation
>> at time equal to 4. otherwise just use NA.
>>
>
> with this dummy data:
>
> data = read.table(header=TRUE, textConnection(open='r', '
> id time x
> 2 2 2
> 2 3 3
> 2 4 4
> 2 5 5
> 3 3 3
> 3 4 4
> 3 5 5
> 4 4 4
> 4 5 5
> 5 5 5'))
>
> you seem to expect the result to be like
>
> # id time x
> # 2 3 3
> # 3 3 3
> # 4 4 4
> # 5 NA NA
>
> one way to hack this is:
>
> # the time points you'd like to use, in order of preference
> times = 3:4
>
> # split the data by id,
> # for each subset, find values of x for the first time found, or use NA
> # combine the subsets back into a single data frame
> do.call(rbind, by(data, data$id, function(data)
> with(data, {
> rows = (time == times[which(times %in% time)[1]])
> if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
> data[rows,] })))
> # id time x
> # 2 2 3 3
> # 3 3 3 3
> # 4 4 4 4
> # 5 5 NA NA
>
> with your original data:
>
> data = read.table(header=TRUE, textConnection(open='r', '
> id time x
> 1 1 10
> 1 2 11
> 1 3 23
> 1 4 23
> 2 2 12
> 2 3 13
> 2 4 14
> 3 1 11
> 3 3 15
> 3 4 18
> 3 5 21
> 4 2 22
> 4 3 27
> 4 6 29'))
> times = 3:4
> do.call(rbind, by(data, data$id, function(data)
> with(data, {
> rows = (time == times[which(times %in% time)[1]])
> if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
> data[rows,] })))
>
> # id time x
> # 1 1 3 23
> # 2 2 3 13
> # 3 3 3 15
> # 4 4 3 27
>
> is this what you wanted?
There's also the straightforward answer:
> sapply(split(data,data$id), function(d) { r <- d$x[d$time==3]
+ if(!length(r)) r <- d$x[d$time==4]
+ if(!length(r)) NA
+ r})
1 2 3 4
23 13 15 27
or, just to checkout the case where time==3 is actually missing:
> sapply(split(data[-c(6,13),],data$id[-c(6,13)]), function(d) {
+ r <- d$x[d$time==3]
+ if(!length(r)) r <- d$x[d$time==4]
+ if(!length(r)) r <- NA
+ r})
1 2 3 4
23 14 15 NA
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list