[R] Subsetting with a list of vectors

David Winsemius dwinsemius at comcast.net
Sun May 23 17:03:08 CEST 2010


On May 23, 2010, at 10:00 AM, Kang Min wrote:

> Hi,
>
> I have a dataset that looks like the one below.
>
> data
> plot     plantno.    species
> H          31             ABC
> D          2               DEF
> Y          54             GFE
> E          12             ERF
> Y          98             FVD
> H          4               JKU
> J           7               JFG
> A          55             EGD
> .            .                 .
> .            .                 .
> .            .                 .
>
> I want to select rows belonging to 7 random plots for 100 times.

So you should be thinking about a function that will do what you want  
exactly once and then wrapping it in replicate().


> (There are 50 plots in total)
> So I created a list of 100 vectors, each vector has 7 elements.
>
> samp <- lapply(1:100, function(i) sample(LETTERS))

Please. "Minimal"!!!   5 samples should be enough for testing.

> samp2 <- lapply(samp2, "[", 1:7)
>
> How can I select the 26 plots from 'data' using 'samp'?
>
> samp3 <- sample(LETTERS, 7)

You do not want to sample from LETTERS but rather from the vector of  
data named "plot". Otherwise you will not be creating a representative  
sample. And ... "plot" is a really crappy name for a column. Try to  
avoid naming your columns with names that are common functions.  
Confusion of the humans reading your code is the predictable result,  
and occasional "confusion" of the R interpreter also may occur.

[After reading your reply to Holtman.... Or maybe you do want to  
sample from LETTERS. The fix would be obvious.]

> samp4 <- subset(data, plot %in% samp3) # this works

So this is what you want to do once:

samp1 <- function() subset(data, plot %in% sample(data$plot, 7) )

samp15 <- replicate(10, samp1())

samp5[,1] will be one sampled subset. (samp10 is now an array of lists.)

Unforfunately, I noticed that even with minimal "data" example you  
provided (not in reproducible form unfortunately) that I was getting 7  
or 8 samples and realized that using letters to subset was creating  
some overlaps whenever "H" was sampled. So this is safer:

samp1 <- function() data[ sample(1:nrow(data), 7 ),]
samp5 <- replicate(5, samp1() )
for(1 in 1:5) print(samp5[,i])

Then I noticed your reply to Holtman, so perhaps you do really wnat  
the first solution. Just so you understand it might not be  
statistically correct.

-- 
David.



> samp5 <- subset(data, plot %in% samp2[[1]]) # this works as well, but
> I used a for loop to get it to select 7 plots 100 times.
>
> for (i in nrow(samp2)) {
>      samp6 <- subset(data, plot %in% samp2[[i]])
> } # this doesn't work
>
> Am I missing something, or is there a better solution?
>
> Thanks.
> Kang Min
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list