[R] Subsetting with a list of vectors
David Winsemius
dwinsemius at comcast.net
Sun May 23 17:03:08 CEST 2010
On May 23, 2010, at 10:00 AM, Kang Min wrote:
> Hi,
>
> I have a dataset that looks like the one below.
>
> data
> plot plantno. species
> H 31 ABC
> D 2 DEF
> Y 54 GFE
> E 12 ERF
> Y 98 FVD
> H 4 JKU
> J 7 JFG
> A 55 EGD
> . . .
> . . .
> . . .
>
> I want to select rows belonging to 7 random plots for 100 times.
So you should be thinking about a function that will do what you want
exactly once and then wrapping it in replicate().
> (There are 50 plots in total)
> So I created a list of 100 vectors, each vector has 7 elements.
>
> samp <- lapply(1:100, function(i) sample(LETTERS))
Please. "Minimal"!!! 5 samples should be enough for testing.
> samp2 <- lapply(samp2, "[", 1:7)
>
> How can I select the 26 plots from 'data' using 'samp'?
>
> samp3 <- sample(LETTERS, 7)
You do not want to sample from LETTERS but rather from the vector of
data named "plot". Otherwise you will not be creating a representative
sample. And ... "plot" is a really crappy name for a column. Try to
avoid naming your columns with names that are common functions.
Confusion of the humans reading your code is the predictable result,
and occasional "confusion" of the R interpreter also may occur.
[After reading your reply to Holtman.... Or maybe you do want to
sample from LETTERS. The fix would be obvious.]
> samp4 <- subset(data, plot %in% samp3) # this works
So this is what you want to do once:
samp1 <- function() subset(data, plot %in% sample(data$plot, 7) )
samp15 <- replicate(10, samp1())
samp5[,1] will be one sampled subset. (samp10 is now an array of lists.)
Unforfunately, I noticed that even with minimal "data" example you
provided (not in reproducible form unfortunately) that I was getting 7
or 8 samples and realized that using letters to subset was creating
some overlaps whenever "H" was sampled. So this is safer:
samp1 <- function() data[ sample(1:nrow(data), 7 ),]
samp5 <- replicate(5, samp1() )
for(1 in 1:5) print(samp5[,i])
Then I noticed your reply to Holtman, so perhaps you do really wnat
the first solution. Just so you understand it might not be
statistically correct.
--
David.
> samp5 <- subset(data, plot %in% samp2[[1]]) # this works as well, but
> I used a for loop to get it to select 7 plots 100 times.
>
> for (i in nrow(samp2)) {
> samp6 <- subset(data, plot %in% samp2[[i]])
> } # this doesn't work
>
> Am I missing something, or is there a better solution?
>
> Thanks.
> Kang Min
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list