[R] Randomly select one row by group from a matrix

Bert Gunter bgunter.4567 at gmail.com
Thu May 18 19:38:46 CEST 2017


If I understand corrrectly, this is easily accomplished in base R via
?tapply and indexing.

e.g.

set.seed(1234) ## for reproducibility
grp <- sample.int(5,size = 30,rep = TRUE) ## a grouping vector
## Could be just a column of your matrix or frame

indx <- tapply(seq_along(grp),grp, sample,size =1)
> indx  ## just to show you what you get
 1  2  3  4  5
19 15 10  6 14

## now just use indx to extract rowd of your matrix or data frame,d:

selected <- d[indx,] ## one row per group


Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, May 18, 2017 at 8:45 AM, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
> Hi Marine,
>
> your manipulation of the matrix is quite convoluted, and it helps to expand
> a bit:
>
> test_lst <- split(test, test[,c("id")])
> test_lst$`1`
>
> after splitting, your matrix has gone back to be a plain vector, which
> makes the sampling fail.
>
> The reason is that, a matrix - behind the scenes - is a vector with a
> dimension and when splitting the matrix you lose the dimension information.
>
> Do you really need to work with a matrix? I prefer data.frames because I
> can mix different types. Also with data.frame you can use the functionality
> of the dplyr library, which also makes things more readable:
>
> library(dplyr)
>
> test_df <- data.frame(xcor = rnorm(8), ycor = rnorm(8), id = c(1, 2))
>
> grouped_test_df <- group_by(test_df, id)
> sample_n(grouped_test_df, 1)
>
> HTH
> Ulrik
>
>
>
> On Thu, 18 May 2017 at 17:18 Marine Regis <marine.regis at hotmail.fr> wrote:
>
>> Hello,
>> I would like to randomly select one row by group from a matrix. Here is an
>> example where there is one row by group. The code gives an error message:
>> test <- matrix(c(4,4, 6,2, 1,2), nrow = 2, ncol = 3, dimnames = list(NULL,
>> c("xcor", "ycor", "id")))
>> do.call(rbind, lapply(split(test, test[,c("id")]), function(x)
>> x[sample(nrow(x), 1), ]))
>>  Show Traceback
>>
>>  Rerun with Debug
>>
>> Error in sample.int(length(x), size, replace, prob) :
>>   invalid first argument
>>
>>
>> How can I modify the code so that it works when there are several rows or
>> one row for a given group?
>> Thanks very much for your time
>> Have a nice day
>> Marine
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list