[R] select observations from longitudinal data

Bill.Venables at csiro.au Bill.Venables at csiro.au
Sun Mar 29 11:45:59 CEST 2009


Let's tackle the bigger problem of doing this not just for time = 3 but for all times.

First we start with your data frame:

> dat
   id time  x
1   1    1 10
2   1    2 11
3   1    3 23
4   1    4 23
5   2    2 12
6   2    3 13
7   2    4 14
8   3    1 11
9   3    3 15
10  3    4 18
11  3    5 21
12  4    2 22
13  4    3 27
14  4    6 29
> 

### Now put the data into an id x time matrix, with gaps:

> mat <- with(dat, {
+ 	lev_id <- sort(unique(id))
+ 	lev_tm <- sort(unique(time))
+ 	M <- matrix(NA, length(lev_id), length(lev_tm))
+ 	dimnames(M) <- list(id = lev_id, time = lev_tm)
+ 	M[cbind(match(id, lev_id), match(time, lev_tm))] <- x
+ 	M
+   })
> mat
   time
id   1  2  3  4  5  6
  1 10 11 23 23 NA NA
  2 NA 12 13 14 NA NA
  3 11 NA 15 18 21 NA
  4 NA 22 27 NA NA 29
> 

### Now do the replacements 
### (this is a very questionable dodge, by the way!)

> for(i in 2:nrow(mat)) 
+ 	if(any(k <- is.na(mat[i-1, ]))) 
+		mat[i-1, k] <- mat[i, k]

> 
> mat
   time
id   1  2  3  4  5  6
  1 10 11 23 23 NA NA
  2 11 12 13 14 21 NA
  3 11 22 15 18 21 29
  4 NA 22 27 NA NA 29
> 

### some gaps cannot be filled.
### now turn it back into a data frame;
### this is a splendid trick that no one knows about:

> dat <- as.data.frame(as.table(mat), responseName = "x")
> dat <- with(dat, dat[order(id, time), ])
> 

### this will look OK, but the first two columns are factors
### as we started with numeric variables (if we did) then
### it might be useful to turn them back to numerical variables
### again:

> dat <- within(dat, {
+ 	id <- as.numeric(as.character(id))
+ 	time <- as.numeric(as.character(time))
+   })
> 
> dat
   id time  x
1   1    1 10
5   1    2 11
9   1    3 23
13  1    4 23
17  1    5 NA
21  1    6 NA
2   2    1 11
6   2    2 12
10  2    3 13
14  2    4 14
18  2    5 21
22  2    6 NA
3   3    1 11
7   3    2 22
11  3    3 15
15  3    4 18
19  3    5 21
23  3    6 29
4   4    1 NA
8   4    2 22
12  4    3 27
16  4    4 NA
20  4    5 NA
24  4    6 29
> 

### As many gaps have been filled as can be filled (with fake data!).  

### If you want to remove those still missing, you can use

> dat <- na.omit(dat)



Bill Venables
http://www.cmis.csiro.au/bill.venables/ 


-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of gallon li
Sent: Sunday, 29 March 2009 6:51 PM
To: r-help
Subject: [R] select observations from longitudinal data

Suppose I have a long format for a longitudinal data

id time x
1 1 10
1 2 11
1 3 23
1 4 23
2 2 12
2 3 13
2 4 14
3 1 11
3 3 15
3 4 18
3 5 21
4 2 22
4 3 27
4 6 29

I want to select the x values for each ID when time is equal to 3. When that
observation is not observed, then I want to replace it with the obervation
at time equal to 4. otherwise just use NA.

How can I implement this with a quick command?

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list