[R] select observations from longitudinal data set

Gabor Grothendieck ggrothendieck at gmail.com
Sun Jan 18 14:29:08 CET 2009


Try this.  'by' splits up the data frame into one data frame
per id and then f acts separately on each such sub-dataframe
returning a ts series with NAs for the missings.  cbind'ing
those all together gives us this series with one column
per id:

> tt
Time Series:
Start = 1
End = 6
Frequency = 1
   1  2  3  4  5
1 10  8  8  9  7
2 12 NA NA NA  9
3 15  9 NA NA NA
4 NA 11 16 NA NA
5 NA 12 NA 13 NA
6 18 NA NA NA 11

and finally we use a string of ifelse's to choose the correct values.

> library(zoo)
> f <- function(d) as.ts(zoo(d$y, d$time, freq = 1))
> tt <- do.call(cbind, by(dat, dat$id, f))
> ifelse(is.na(tt[4,]), ifelse(is.na(tt[3,]), tt[5,], tt[3,]), tt[4,])
 1  2  3  4  5
15 11 16 13 NA

As in the example data, we have assumed that at least one of the
sub-dataframes has a point at time 1 and at least one has a
point at time 5.

On Sun, Jan 18, 2009 at 2:42 AM, gallon li <gallon.li at gmail.com> wrote:
> I have the following longitudinal data:
>
> id time y
> 1 1 10
> 1 2 12
> 1 3 15
> 1 6 18
> 2 1 8
> 2 3 9
> 2 4 11
> 2 5 12
> 3 1 8
> 3 4 16
> 4 1 9
> 4 5 13
> 5 1 7
> 5 2 9
> 5 6 11
> ....
>
> I want to select the observations at time 4. if the observation at time 4 is
> missing, then i want to slect the observation at time 3. if the observation
> at time 3 is also missing, then i want to select observation at time 5.
> otherwise i will put a missing value there. the selected set is like
>
> id time y
> 1 3 15
> 2 4 11
> 3 4 16
> 4 5 13
> 5 4 NA
> ...
>
> so the rule is (1) obs at time 4 for each id; (2) if no such obs, then look
> for obs at time 3; (3) if no such obs, then look for obs at time 5; (4)
> otherwise, NA.
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list