[R] select observations from longitudinal data

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Sun Mar 29 13:09:56 CEST 2009


gallon li wrote:
> Suppose I have a long format for a longitudinal data
>
> id time x
> 1 1 10
> 1 2 11
> 1 3 23
> 1 4 23
> 2 2 12
> 2 3 13
> 2 4 14
> 3 1 11
> 3 3 15
> 3 4 18
> 3 5 21
> 4 2 22
> 4 3 27
> 4 6 29
>
> I want to select the x values for each ID when time is equal to 3. When that
> observation is not observed, then I want to replace it with the obervation
> at time equal to 4. otherwise just use NA.
>   

with this dummy data:

    data = read.table(header=TRUE, textConnection(open='r', '
        id time x      
        2 2 2
        2 3 3
        2 4 4
        2 5 5
        3 3 3
        3 4 4
        3 5 5
        4 4 4
        4 5 5
        5 5 5'))

you seem to expect the result to be like

    # id time x
    # 2 3 3
    # 3 3 3
    # 4 4 4
    # 5 NA NA

one way to hack this is:

    # the time points you'd like to use, in order of preference
    times = 3:4

    # split the data by id,
    # for each subset, find values of x for the first time found, or use NA
    # combine the subsets back into a single data frame
    do.call(rbind, by(data, data$id, function(data)
        with(data, {
            rows = (time == times[which(times %in% time)[1]])
            if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))
    #   id time  x
    # 2  2    3  3
    # 3  3    3  3
    # 4  4    4  4
    # 5  5   NA NA

with your original data:

    data = read.table(header=TRUE, textConnection(open='r', '
       id time x
       1 1 10
       1 2 11
       1 3 23
       1 4 23
       2 2 12
       2 3 13
       2 4 14
       3 1 11
       3 3 15
       3 4 18
       3 5 21
       4 2 22
       4 3 27
       4 6 29'))
    times = 3:4
    do.call(rbind, by(data, data$id, function(data)
        with(data, {
            rows = (time == times[which(times %in% time)[1]])
            if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))

    #   id time  x
    # 1  1    3 23
    # 2  2    3 13
    # 3  3    3 15
    # 4  4    3 27

is this what you wanted?

vQ




More information about the R-help mailing list