[R] sampling from data.frame
jim holtman
jholtman at gmail.com
Wed Dec 3 00:53:08 CET 2008
Not sure exactly what you mean by 'sample' since you did not provide
an example of the expected output, or input data that could be used.
Here is an example of taking one sample from each cluster:
> df <- data.frame(id=paste("C", rep(1:5, each=3), sep=''), data=1:15)
> # sample 1 from each cluster
> result <- lapply(split(seq(nrow(df)), df$id), function(.indx){
+ df[sample(.indx, 1),]
+ })
> do.call(rbind,result)
id data
C1 C1 1
C2 C2 4
C3 C3 9
C4 C4 11
C5 C5 15
>
> result <- lapply(split(seq(nrow(df)), df$id), function(.indx){
+ df[sample(.indx, 1),]
+ })
> do.call(rbind,result)
id data
C1 C1 2
C2 C2 6
C3 C3 9
C4 C4 11
C5 C5 15
>
>
> result <- lapply(split(seq(nrow(df)), df$id), function(.indx){
+ df[sample(.indx, 1),]
+ })
> do.call(rbind,result)
id data
C1 C1 3
C2 C2 4
C3 C3 8
C4 C4 10
C5 C5 13
>
On Tue, Dec 2, 2008 at 6:27 PM, axionator <axionator at gmail.com> wrote:
> Hi all,
> I have a data frame with "clustered" rows as follows:
> Cu1 x1 y1 z1 ...
> Cu1 x2 y2 z2 ...
> Cu1 x3 y3 z3 ... # end of first cluster Cu1
> Cu2 x4 y4 z4 ...
> Cu2 x5 y5 z5
> Cu2 ... # end of second cluster Cu2
> Cu3 ...
> ...
> "cluster"-size is 3 in the example above (rows making up a cluster are
> always consecutive). Is there any faster way to sample n clusters
> (with replacement) from this dataframe and build up a new data frame
> out of these sampled clusters? I use the "sample" function and a
> for-loop.
>
> Thanks in advance
> Armin
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?
More information about the R-help
mailing list