[R] how to subsample all possible combinations of n species taken 1:n at a time?

Mon Apr 6 17:39:21 CEST 2009

Hello

I apologise for the length of this entry but please bear with me.

In short:
I need a way of subsampling communities from all possible communities of n
taxa taken 1:n at a time without having to calculate all possible
combinations (because this gives me a memory error - using 
combn() or expand.grid() at least). Does anyone know of a function? Or can
you help me edit the 
combn
or 
expand.grid 
functions to generate subsamples?

In long:
I have been creating all possible communities of n taxa taken 1:n at a time
to get a presence/absence matrix of species occurrence in communities as
below...

Rows are samples, columns are species:

    A    B    C   D     .     .    .    .
    1    0    1    1    1    0    0    0    1     1     1     1     0     0    
0     0
    0    1    1    1    1    0    0    0    1     1     1     1     0     0    
0     0
    1    1    1    1    1    0    0    0    1     1     1     1     0     0    
0     0
    0    0    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    1    0    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    0    1    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    1    1    0    0    0    1    0    0    1     1     1     1     0     0    
0     0
    0    0    1    0    0    1    0    0    1     1     1     1     0     0    
0     0

...but the number of possible communities increases exponentially with each
added taxon. 

n<-11     #number of taxa
sum(for (i in 0:n) choose(i, k = 0:i)) #number of combos

So all possible combinations of 11 taxa taken 1:11 at a time is 2048, all
combos of 12 taken 1:12 is 4096, 13 taken 1:13 = 8192...etc etc such that
when I reach about 25 taken 1:25 the number of combos is 33554432 and I get
a memory error.

I have found that the number of combos of x taxa taken from a pool of n
creates a very kurtotic unimodal distribution,... 

x<-vector("integer",20)
for (i in 1:20) {x[i]<-choose(20,i)}
plot(x)

...but have found that limiting the number of samples for any community size
to 1000 is good enough for the further analyses I wish to do.
My problem lies in sampling all possible combos without having to calculate
all possible combos. I have tried two methods but both give memory errors at
about 25 taxa.

The expand.grid() method:

n <- 11 
toto <- vector("list",n)
titi <- lapply(toto,function(x) c(0,1))
tutu <- expand.grid(titi)

The combn() method (a slightly lengthlier function):

samplecommunityD<- function(n,numsamples)
{
super<-mat.or.vec(,n)
for (numspploop in 1:n)
{
  minor<-t(combn(n,numspploop))
  if (dim(minor)[1]<numsamples)
  {
    minot<-mat.or.vec(dim(minor)[1],n)
    for (loopi in 1:dim(minor)[1])
    {
      for (loopbi in 1:dim(minor)[2])
      {
        minot[loopi,minor[loopi,loopbi]] <- 1
      }
    }
    super<-rbind(super,minot)
    rm(minot)
  }
  else
 {
   minot<-mat.or.vec(numsamples,n)
   for (loopii in 1:numsamples)
   {
     thousand<-sample(dim(minor)[1],numsamples)
       for (loopbii in 1:dim(minor)[2])
       {
       minot[loopii,minor[thousand[loopii],loopbii]] <- 1
       }
   }
   super<-rbind(super,minot)
   rm(minot)
 }
}
super<-super[!rowSums(super)>n-1&!rowSums(super)<2,]
return(super)
}

samplecommunityD(11,1000)

So unless anyone knows of another function I could try my next step would be
to modify the combn or expand.grid functions to generate subsamples, but
their coding beyond me at this stage (I'm a 3.5 month newbie). Can anyone
identify where in the code I would need to introduce a sampling term or
skipping sequence?

Thanks for your time
Jasper

-- 
View this message in context: http://www.nabble.com/how-to-subsample-all-possible-combinations-of-n-species-taken-1%3An-at-a-time--tp22911399p22911399.html
Sent from the R help mailing list archive at Nabble.com.