[R] sampling from data.frame
Charles C. Berry
cberry at tajo.ucsd.edu
Wed Dec 3 02:19:12 CET 2008
On Wed, 3 Dec 2008, axionator wrote:
> Hi all,
> I have a data frame with "clustered" rows as follows:
> Cu1 x1 y1 z1 ...
> Cu1 x2 y2 z2 ...
> Cu1 x3 y3 z3 ... # end of first cluster Cu1
> Cu2 x4 y4 z4 ...
> Cu2 x5 y5 z5
> Cu2 ... # end of second cluster Cu2
> Cu3 ...
> ...
> "cluster"-size is 3 in the example above (rows making up a cluster are
> always consecutive). Is there any faster way to sample n clusters
> (with replacement) from this dataframe and build up a new data frame
> out of these sampled clusters? I use the "sample" function and a
> for-loop.
Something like this:
cl.samps <- sample( split( df, df$cluster ), n.samps, repl=TRUE )
do.call( rbind, cl.samps )
If you need to identify the samples from which the rows came (versus just
the originating clusters):
cl.samps2 <- lapply( seq(along=cl.samps),
function(x) cbind( cl.samps[[ x ]], new.cluster = x ) )
do.call( rbind, cl.samps2 )
HTH,
Chuck
>
> Thanks in advance
> Armin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list