[R] vectorizing: selecting one record per group

David Winsemius dwinsemius at comcast.net
Wed Oct 13 22:29:15 CEST 2010


On Oct 13, 2010, at 4:17 PM, Erik Iverson wrote:

> Hello,
>
> There are probably many ways to do this, but I think
> it's easier if you use a data.frame as your object.
>
> The easy solution for the matrix you provide is escaping
> me at the moment.

Perhaps using sampling to derive an index?

A[ tapply(1:100, A[,3], sample, 1), ]

 > A[tapply(1:100, A[,3], sample, 1), ]
            [,1]      [,2] [,3]
[1,] -1.9512142 0.9823905    1
[2,]  1.4983879 0.4961661    2
[3,]  0.7815468 0.3531835    3
[4,] -0.9210731 0.6508500    4
[5,]  0.2354838 0.8616220    5

--  
David.

>
> One solution, using the plyr package:
>
>
> library(plyr)
> A <- data.frame(a = rnorm(100),b = runif(100), c = rep(c(1,2,3,4,5), 
> 20))
> ddply(A, .(c), function(x) x[sample(1:nrow(x), 1), ])
>
>            a         b c
> 1  0.02995847 0.4763819 1
> 2  0.72035194 0.2948611 2
> 3  1.34963917 0.2057488 3
> 4 -1.99427160 0.1147923 4
> 5 -0.73612703 0.5889539 5
>
>
> Mauricio Romero wrote:
>> Hi,
>> I want to select a subsample from my data, choosing one record from  
>> each
>> group. I know how to do this with a for.
>> For example: lets say I have the data:
>> A=cbind(rnorm(100),runif(100),(rep(c(1,2,3,4,5),20)))
>> Where the third column is the group variable. Then what I want is  
>> to select
>> 5 observations. Each one taken randomly from each group.
>>  INDEX =NULL
>> i=1
>> for(index_g in  unique(A[,3])){
>> INDEX [i]=sample(which(A[,3]==index_g),1)
>> i=i+1
>> }
>> SEL=A[INDEX,]
>>  Is there a way to do this without a “for”? in other words is there  
>> a way to
>> “vectorize” this?
>> Thank you,
>>  Mauricio Romero Quantil S.A.S.
>> Bogotá,Colombia
>> www.quantil.com.co
>> "It is from the earth that we must find our substance; it is on the  
>> earth
>> that we must find solutions to the problems that promise to destroy  
>> all life
>> here"
>> 	[[alternative HTML version deleted]]
>> ------------------------------------------------------------------------
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list