[R] Permutations and large data sets

Chris Miller chrisamiller at gmail.com
Wed Nov 12 23:47:25 CET 2008


I have 200 samples, with 1 million data points in each. Each data
point can have a value from zero to 10, and we can assume that they're
normally distributed. If I calculate a sum by drawing one random data
point from each sample and adding them, what value does that sum need
to be before I can say that it's higher than 95% of the other possible
sums (with reasonable probability)?

The brute-force way to do this is to calculate all possible sums, sort
them, then find the value 95% of the way through the list. Obviously,
this won't work, since the number of permutations is astronomical. So
what's the appropriate way to approximate this, using R?

Thanks,

Chris Miller



More information about the R-help mailing list