[R] sampling from data frame
Bill.Venables@cmis.csiro.au
Bill.Venables at cmis.csiro.au
Thu Jun 6 09:57:39 CEST 2002
Maria Wolters asks:
> -----Original Message-----
> From: Maria Wolters [mailto:maria at rhetorical.com]
> Sent: Thursday, June 06, 2002 5:24 PM
> To: R-help Digest
> Cc: r-help-digest at stat.math.ethz.ch
> Subject: [R] sampling from data frame
>
>
> Hello,
>
> after searching through the archives and
> not finding a thread that answers this question,
> I thought I'd pass it on to the list.
>
> Given a data frame and given a factor variable
> that assigns a class to each case in the data frame,
> what is the most efficient way to sample
> a given number of cases from each class?
[WNV] Not clear what you mean. Let me take a stab. Suppose the
data frame is Dat and the factor is G. Furthermore suppose the classes are
G1, G2, ..., Gm and the vector k tells you how many you want from each
class, k[1] from G1, ... , k[m] from Gm.
Here is a way of sampling without replacement in this way, but I
would not say it is necessarily the most efficient:
bits <- split( 1:nrow(Dat), Dat$G) # find the indices for
each class
wh <- sapply(1:m, function(x, k) sample(bits[x], k[x]), k =
k) # Pick the samples
DatSample <- Dat[wh, ]
This gives you the sample as a data frame consisting of the selected
rows of Dat. I'm not all that convinced that you need to do the picking
using sapply, in fact. Personally I'd probably just use a loop that I
didn't have to think about too hard:
wh <- list( )
for(j in 1:m) wh[[j]] <- sample(bits[j], k[j])
wh <- unlist(wh)
If you want a constant number from each class, this is a bit
simpler.
> I've found a roundabout solution that works as follows:
> for each class:
> assign unique index to each class member
> chosen_cases <- sample(n,indexvariable)
> extract chosen_cases from data frame
> (i.e. chosen <- subset(data, indexvariable %in% chosen_cases))
[WNV] I know this is meta-code, but using _ in names can be a bit
ambiguous...
> this solution relies on the Hmisc library and is
> horribly inefficient. Any ideas on how to make it better
> would be greatly appreciated.
>
> Best from Edinburgh,
[WNV] Same from Brisbane, where I suspect the temperature is
getting close to that in Edinburgh right now. In July you might just have
that edge, though... :-)
Bill Venables
> Maria
>
> --
> Maria Wolters maria.wolters
> Development Engineer AT
> Rhetorical Systems Ltd. rhetorical.com
> Edinburgh
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-
> r-help mailing list -- Read
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list