[R] strata -- really slow performance
hadley wickham
h.wickham at gmail.com
Mon Jul 13 06:56:59 CEST 2009
> In this simple example, it took less than half a second to generate the
> result. That is on a 2.93 Ghz MacBook Pro.
>
>
> So, for your data, the code would look something like this:
>
>
> system.time(DF.new <- do.call(rbind,
> lapply(split(patch_summary,
> patch_summary$UniqueID),
> function(x) x[sample(nrow(x), 1), ])))
For large data, you can make it even faster with
sample_rows <- function(df, n) {
df[sample(nrow(df), n), ]
}
library(plyr)
system.time(DF.new <- ddply(DF, "ID", sample_rows, n = 1))
ddply uses some tricks to avoid copying DF which really make a
different for large data (unfortunately it also increases the overhead
so it is currently slower for small data)
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list