[R] Selecting values

Marc Schwartz marc_schwartz at comcast.net
Sat Sep 29 00:58:52 CEST 2007


Here is yet another approach using aggregate(), which internally,
basically does what my first solution did:

> aggregate(z[, 2], list(z[, 1]), "[", 1)
  Group.1          x
1       1 -1.2006469
2       2 -0.1614918
3       3 -0.5717729
4       4 -0.2398887
5       5  1.1690564

See ?aggregate

Note that you get a data frame as a result, rather than a matrix.

Also, you could 'collapse' the split() and sapply() part of my first
solution using tapply():

> tapply(z[, 2], z[, 1], "[", 1)
         1          2          3          4          5 
-1.2006469 -0.1614918 -0.5717729 -0.2398887  1.1690564 


As has been said by fortune("Yoda"):

Evelyn Hall: I would like to know how (if) I can extract some of the
information from the summary of my nlme.
Simon Blomberg: This is R. There is no if. Only how.
   -- Evelyn Hall and Simon 'Yoda' Blomberg
      R-help (April 2005)

HTH,

Marc Schwartz

On Fri, 2007-09-28 at 16:12 -0600, Matthew Keller wrote:
> Is this easier?
> 
> x.index <- duplicated(x.sample)==FALSE
> cbind(x.sample[x.index],y[x.index])
> 
> 
> - Matt
> 
> On 9/28/07, Marc Schwartz <marc_schwartz at comcast.net> wrote:
> > On Fri, 2007-09-28 at 17:48 -0400, Brian Perron wrote:
> > > Hello all,
> > >
> > > An elementary question that I am sure can be easily cracked by an R
> > > enthusiast.  Let's say I have multiple scores (y) on subjects (x.sample).
> > > Some subjects have a few more scores than others.  Can somebody suggest some
> > > code that will select the first score for each subject?
> > >
> > > For example, the following code generates scores for 5 subjects:
> > >
> > > > x <- c(1:5)
> > > > x.sample <- sample(x, 20, replace = TRUE)
> > > > x.sample <- sort(x.sample)
> > > > y <- rnorm(20)
> > > > z <- cbind(x.sample, y)
> > > > z
> > >
> > >       x.sample          y
> > >  [1,]        1 -1.2006469
> > >  [2,]        1  0.7615261
> > >  [3,]        1 -0.1287516
> > >  [4,]        1 - 1.1796474
> > >  [5,]        1 -1.2902519
> > >  [6,]        2 -0.1614918
> > >  [7,]        2 -0.1464773
> > >  [8,]        2 -0.8875417
> > >  [9,]        2  0.3062891
> > > [10,]        2  0.4398530
> > > [11,]        3 -0.5717729
> > > [12,]        3 - 0.2938118
> > > [13,]        4 -0.2398887
> > > [14,]        4  0.8425419
> > > [15,]        4  2.5269801
> > > [16,]        4 -0.3643613
> > > [17,]        5  1.1690564
> > > [18,]        5 -0.7644521
> > > [19,]        5  1.4178982
> > > [20,]        5 - 0.8198921
> > >
> > > I am only interested in extracting the first score (y) for each unique
> > > subject (x.sample).  So, I would like to generate the following output.
> > >
> > >         x.sample       y
> > > [1,]    1                  -1.2006469
> > > [2,]    2                  -0.1614918
> > > [3,]    3                  -0.5717729
> > > [4,]    4                  -0.2398887
> > > [5,]    5                   1.1690564
> > >
> > > Any assistance would be greatly appreciated.
> > >
> > > Regards,
> > > Brian
> >
> > See ?split, ?sapply and ?unique.
> >
> > Then try this:
> >
> > > cbind(unique(z[, 1]), sapply(split(z[, 2], z[, 1]), "[", 1))
> >   [,1]       [,2]
> > 1    1 -1.2006469
> > 2    2 -0.1614918
> > 3    3 -0.5717729
> > 4    4 -0.2398887
> > 5    5  1.1690564
> >
> >
> > The key part of that is:
> >
> > > split(z[, 2], z[, 1])
> > $`1`
> > [1] -1.2006469  0.7615261 -0.1287516 -1.1796474 -1.2902519
> >
> > $`2`
> > [1] -0.1614918 -0.1464773 -0.8875417  0.3062891  0.4398530
> >
> > $`3`
> > [1] -0.5717729 -0.2938118
> >
> > $`4`
> > [1] -0.2398887  0.8425419  2.5269801 -0.3643613
> >
> > $`5`
> > [1]  1.1690564 -0.7644521  1.4178982 -0.8198921
> >
> >
> > which splits 'z' by the values in the first column.
> >
> > Then we use sapply() to go through the list and subset the first element
> > in each vector:
> >
> > > sapply(split(z[, 2], z[, 1]), "[", 1)
> >          1          2          3          4          5
> > -1.2006469 -0.1614918 -0.5717729 -0.2398887  1.1690564
> >
> >
> > Then we cbind() that result to the unique values in the first column.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
>



More information about the R-help mailing list