[R] select observations from longitudinal data
Wacek Kusnierczyk
Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Sun Mar 29 13:09:56 CEST 2009
gallon li wrote:
> Suppose I have a long format for a longitudinal data
>
> id time x
> 1 1 10
> 1 2 11
> 1 3 23
> 1 4 23
> 2 2 12
> 2 3 13
> 2 4 14
> 3 1 11
> 3 3 15
> 3 4 18
> 3 5 21
> 4 2 22
> 4 3 27
> 4 6 29
>
> I want to select the x values for each ID when time is equal to 3. When that
> observation is not observed, then I want to replace it with the obervation
> at time equal to 4. otherwise just use NA.
>
with this dummy data:
data = read.table(header=TRUE, textConnection(open='r', '
id time x
2 2 2
2 3 3
2 4 4
2 5 5
3 3 3
3 4 4
3 5 5
4 4 4
4 5 5
5 5 5'))
you seem to expect the result to be like
# id time x
# 2 3 3
# 3 3 3
# 4 4 4
# 5 NA NA
one way to hack this is:
# the time points you'd like to use, in order of preference
times = 3:4
# split the data by id,
# for each subset, find values of x for the first time found, or use NA
# combine the subsets back into a single data frame
do.call(rbind, by(data, data$id, function(data)
with(data, {
rows = (time == times[which(times %in% time)[1]])
if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))
# id time x
# 2 2 3 3
# 3 3 3 3
# 4 4 4 4
# 5 5 NA NA
with your original data:
data = read.table(header=TRUE, textConnection(open='r', '
id time x
1 1 10
1 2 11
1 3 23
1 4 23
2 2 12
2 3 13
2 4 14
3 1 11
3 3 15
3 4 18
3 5 21
4 2 22
4 3 27
4 6 29'))
times = 3:4
do.call(rbind, by(data, data$id, function(data)
with(data, {
rows = (time == times[which(times %in% time)[1]])
if (is.na(rows[1])) data.frame(id=id, time=NA, x=NA) else
data[rows,] })))
# id time x
# 1 1 3 23
# 2 2 3 13
# 3 3 3 15
# 4 4 3 27
is this what you wanted?
vQ
More information about the R-help
mailing list