[R] Selecting values

Sat Sep 29 00:08:51 CEST 2007

On Fri, 2007-09-28 at 17:48 -0400, Brian Perron wrote:
> Hello all,
> 
> An elementary question that I am sure can be easily cracked by an R
> enthusiast.  Let's say I have multiple scores (y) on subjects (x.sample).
> Some subjects have a few more scores than others.  Can somebody suggest some
> code that will select the first score for each subject?
> 
> For example, the following code generates scores for 5 subjects:
> 
> > x <- c(1:5)
> > x.sample <- sample(x, 20, replace = TRUE)
> > x.sample <- sort(x.sample)
> > y <- rnorm(20)
> > z <- cbind(x.sample, y)
> > z
> 
>       x.sample          y
>  [1,]        1 -1.2006469
>  [2,]        1  0.7615261
>  [3,]        1 -0.1287516
>  [4,]        1 - 1.1796474
>  [5,]        1 -1.2902519
>  [6,]        2 -0.1614918
>  [7,]        2 -0.1464773
>  [8,]        2 -0.8875417
>  [9,]        2  0.3062891
> [10,]        2  0.4398530
> [11,]        3 -0.5717729
> [12,]        3 - 0.2938118
> [13,]        4 -0.2398887
> [14,]        4  0.8425419
> [15,]        4  2.5269801
> [16,]        4 -0.3643613
> [17,]        5  1.1690564
> [18,]        5 -0.7644521
> [19,]        5  1.4178982
> [20,]        5 - 0.8198921
> 
> I am only interested in extracting the first score (y) for each unique
> subject (x.sample).  So, I would like to generate the following output.
> 
>         x.sample       y
> [1,]    1                  -1.2006469
> [2,]    2                  -0.1614918
> [3,]    3                  -0.5717729
> [4,]    4                  -0.2398887
> [5,]    5                   1.1690564
> 
> Any assistance would be greatly appreciated.
> 
> Regards,
> Brian

See ?split, ?sapply and ?unique.

Then try this:

> cbind(unique(z[, 1]), sapply(split(z[, 2], z[, 1]), "[", 1))
  [,1]       [,2]
1    1 -1.2006469
2    2 -0.1614918
3    3 -0.5717729
4    4 -0.2398887
5    5  1.1690564

The key part of that is:

> split(z[, 2], z[, 1])
$`1`
[1] -1.2006469  0.7615261 -0.1287516 -1.1796474 -1.2902519

$`2`
[1] -0.1614918 -0.1464773 -0.8875417  0.3062891  0.4398530

$`3`
[1] -0.5717729 -0.2938118

$`4`
[1] -0.2398887  0.8425419  2.5269801 -0.3643613

$`5`
[1]  1.1690564 -0.7644521  1.4178982 -0.8198921

which splits 'z' by the values in the first column.

Then we use sapply() to go through the list and subset the first element
in each vector:

> sapply(split(z[, 2], z[, 1]), "[", 1)
         1          2          3          4          5 
-1.2006469 -0.1614918 -0.5717729 -0.2398887  1.1690564 

Then we cbind() that result to the unique values in the first column.

HTH,

Marc Schwartz