[R] Sample rows in data frame by subsets

Liaw, Andy andy_liaw at merck.com
Mon Jan 23 21:48:02 CET 2006


Here's one way, if you want to do it in one command:

do.call("rbind", lapply(split(d, d$fac), function(x) x[sample(nrow(x),
nrow(x), replace=TRUE),]))

split() splits the data into a list of data frames, by d$fac.  The lapply()
call then returns the same list, with the components replaced with the
resample of the original components.  Then just rbind them together.

Andy

From: Chris Stubben
> 
> Hi,
> 
> I need to resample rows in a data frame by subsets
> 
> L3 <- LETTERS[1:3]
> d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
>     x  y fac
> 1  1  1   A
> 2  1  2   A
> 3  1  3   A
> 4  1  4   A
> 5  1  5   C
> 6  1  6   C
> 7  1  7   B
> 8  1  8   A
> 9  1  9   C
> 10 1 10   A
> 
> I have seen this used to sample rows with replacement
> 
> d[sample(nrow(d), replace=T), ]
> 
>      x  y fac
> 7   1  7   B
> 2   1  2   A
> 1   1  1   A
> 3   1  3   A
> 2.1 1  2   A
> 10  1 10   A
> 8   1  8   A
> 9   1  9   C
> 1.1 1  1   A
> 8.1 1  8   A
> 
> 
> but I would like to sample based on the original number in fac
> 
> summary(d$fac)
> A B C
> 6 1 3
> 
> 
> rbind(subset(d, fac=="A")[sample(6, replace=T), ],
>        subset(d, fac=="B")[sample(1, replace=T), ],
>        subset(d, fac=="C")[sample(3, replace=T), ] )
> 
>      x  y fac
> 2   1  2   A
> 3   1  3   A
> 3.1 1  3   A
> 1   1  1   A
> 10  1 10   A
> 1.1 1  1   A
> 7   1  7   B
> 5   1  5   C
> 6   1  6   C
> 5.1 1  5   C
> 
> 
> Is there an easy way to do this in one step or with a short 
> function?  I 
> have lots of dataframes to resample.
> 
> Thanks,
> 
> Chris
> 
> 
> -- 
> -----------------
> Chris Stubben
> 
> Los Alamos National Lab
> BioScience Division
> MS M888
> Los Alamos, NM 87545
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list