[Rd] Listing all possible samples of Size two form Large Population

Ben Bolker bolker at ufl.edu
Fri May 30 18:11:31 CEST 2008


Nadeem Shafique <nadeemshafique <at> gmail.com> writes:

> 
> Respected All,
> 
> I need some efficient program or package to draw all possible samples
> of size two without replacement. I am using "combinat" package to list
> all possible samples but it hangs my computer for larger populations
> say 10,000 (i.e. 49995000 all possible samples). I wish to even work
> for larger populations then this and replicate this procedure for many
> times. Kindly can anyone figure out the possibilities and let me know.
> 

50 million samples sounds like a lot already -- hope you
have a lot of memory (and I am tempted to wonder what you're
going to find out that a random subsample wouldn't tell you ...)

object.size(numeric(5e7))/2^20
[1] 381.4697

 -- already 381MB (although maybe you have a lot of memory),
and you have to double that to hold both elements of the
combination.

The algorithm for enumerating these samples by brute force is
pretty easy --

for (i in 2:N) {
    for (j in 1:(i-1)) {
   cat(i,j,"\n")
 }
}

  -- but of course these loops will be really slow for large N.
There may (?) be a way to do this in a vectorized fashion
(the only quick and dirty ways I can think of doing this
involve creating the whole sample and then cutting it down,
which is probably not worth the time, e.g.

> N = 10
> i=1:N
> j=1:N
> e=expand.grid(i,j)
> m=matrix(1:nrow(e),nrow=N)
> s=e[m[lower.tri(m)],]

  I would create a little snippet of C code to do this.
You could also look at the inline package (on CRAN) or
Ra and the jit package, although both of these are more
experimental than just writing the C code, compiling it,
and linking it in.

  Bottom line: this should be possible, but I don't
know of a package that does it automatically, and if I
were you I would think seriously about what question you
really want to answer and whether there's a less brute-force
way of doing it.

  cheers
    Ben Bolker



More information about the R-devel mailing list