[R] Selecting values
Matthew Keller
mckellercran at gmail.com
Sat Sep 29 00:12:50 CEST 2007
Is this easier?
x.index <- duplicated(x.sample)==FALSE
cbind(x.sample[x.index],y[x.index])
- Matt
On 9/28/07, Marc Schwartz <marc_schwartz at comcast.net> wrote:
> On Fri, 2007-09-28 at 17:48 -0400, Brian Perron wrote:
> > Hello all,
> >
> > An elementary question that I am sure can be easily cracked by an R
> > enthusiast. Let's say I have multiple scores (y) on subjects (x.sample).
> > Some subjects have a few more scores than others. Can somebody suggest some
> > code that will select the first score for each subject?
> >
> > For example, the following code generates scores for 5 subjects:
> >
> > > x <- c(1:5)
> > > x.sample <- sample(x, 20, replace = TRUE)
> > > x.sample <- sort(x.sample)
> > > y <- rnorm(20)
> > > z <- cbind(x.sample, y)
> > > z
> >
> > x.sample y
> > [1,] 1 -1.2006469
> > [2,] 1 0.7615261
> > [3,] 1 -0.1287516
> > [4,] 1 - 1.1796474
> > [5,] 1 -1.2902519
> > [6,] 2 -0.1614918
> > [7,] 2 -0.1464773
> > [8,] 2 -0.8875417
> > [9,] 2 0.3062891
> > [10,] 2 0.4398530
> > [11,] 3 -0.5717729
> > [12,] 3 - 0.2938118
> > [13,] 4 -0.2398887
> > [14,] 4 0.8425419
> > [15,] 4 2.5269801
> > [16,] 4 -0.3643613
> > [17,] 5 1.1690564
> > [18,] 5 -0.7644521
> > [19,] 5 1.4178982
> > [20,] 5 - 0.8198921
> >
> > I am only interested in extracting the first score (y) for each unique
> > subject (x.sample). So, I would like to generate the following output.
> >
> > x.sample y
> > [1,] 1 -1.2006469
> > [2,] 2 -0.1614918
> > [3,] 3 -0.5717729
> > [4,] 4 -0.2398887
> > [5,] 5 1.1690564
> >
> > Any assistance would be greatly appreciated.
> >
> > Regards,
> > Brian
>
> See ?split, ?sapply and ?unique.
>
> Then try this:
>
> > cbind(unique(z[, 1]), sapply(split(z[, 2], z[, 1]), "[", 1))
> [,1] [,2]
> 1 1 -1.2006469
> 2 2 -0.1614918
> 3 3 -0.5717729
> 4 4 -0.2398887
> 5 5 1.1690564
>
>
> The key part of that is:
>
> > split(z[, 2], z[, 1])
> $`1`
> [1] -1.2006469 0.7615261 -0.1287516 -1.1796474 -1.2902519
>
> $`2`
> [1] -0.1614918 -0.1464773 -0.8875417 0.3062891 0.4398530
>
> $`3`
> [1] -0.5717729 -0.2938118
>
> $`4`
> [1] -0.2398887 0.8425419 2.5269801 -0.3643613
>
> $`5`
> [1] 1.1690564 -0.7644521 1.4178982 -0.8198921
>
>
> which splits 'z' by the values in the first column.
>
> Then we use sapply() to go through the list and subset the first element
> in each vector:
>
> > sapply(split(z[, 2], z[, 1]), "[", 1)
> 1 2 3 4 5
> -1.2006469 -0.1614918 -0.5717729 -0.2398887 1.1690564
>
>
> Then we cbind() that result to the unique values in the first column.
>
> HTH,
>
> Marc Schwartz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Matthew C Keller
Postdoctoral Fellow
Virginia Institute for Psychiatric and Behavioral Genetics
More information about the R-help
mailing list