[R] Sample rows in data frame by subsets
Liaw, Andy
andy_liaw at merck.com
Mon Jan 23 21:48:02 CET 2006
Here's one way, if you want to do it in one command:
do.call("rbind", lapply(split(d, d$fac), function(x) x[sample(nrow(x),
nrow(x), replace=TRUE),]))
split() splits the data into a list of data frames, by d$fac. The lapply()
call then returns the same list, with the components replaced with the
resample of the original components. Then just rbind them together.
Andy
From: Chris Stubben
>
> Hi,
>
> I need to resample rows in a data frame by subsets
>
> L3 <- LETTERS[1:3]
> d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, repl=TRUE))
> x y fac
> 1 1 1 A
> 2 1 2 A
> 3 1 3 A
> 4 1 4 A
> 5 1 5 C
> 6 1 6 C
> 7 1 7 B
> 8 1 8 A
> 9 1 9 C
> 10 1 10 A
>
> I have seen this used to sample rows with replacement
>
> d[sample(nrow(d), replace=T), ]
>
> x y fac
> 7 1 7 B
> 2 1 2 A
> 1 1 1 A
> 3 1 3 A
> 2.1 1 2 A
> 10 1 10 A
> 8 1 8 A
> 9 1 9 C
> 1.1 1 1 A
> 8.1 1 8 A
>
>
> but I would like to sample based on the original number in fac
>
> summary(d$fac)
> A B C
> 6 1 3
>
>
> rbind(subset(d, fac=="A")[sample(6, replace=T), ],
> subset(d, fac=="B")[sample(1, replace=T), ],
> subset(d, fac=="C")[sample(3, replace=T), ] )
>
> x y fac
> 2 1 2 A
> 3 1 3 A
> 3.1 1 3 A
> 1 1 1 A
> 10 1 10 A
> 1.1 1 1 A
> 7 1 7 B
> 5 1 5 C
> 6 1 6 C
> 5.1 1 5 C
>
>
> Is there an easy way to do this in one step or with a short
> function? I
> have lots of dataframes to resample.
>
> Thanks,
>
> Chris
>
>
> --
> -----------------
> Chris Stubben
>
> Los Alamos National Lab
> BioScience Division
> MS M888
> Los Alamos, NM 87545
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list