[R] Randomising matrices

Sat Apr 28 20:23:39 CEST 2007

Nick Cutler <s0455078 <at> sms.ed.ac.uk> writes:

> 
> I would like to be able to randomise presence-absence (i.e. binary) 
> matrices whilst keeping both the row and column totals constant. Is 
> there a function in R that would allow me to do this?
> 
> I'm working with vegetation presence-absence matrices based on field 
> observations. The matrices are formatted to have sites as rows and 
> species as columns. The presence of a species on a site is indicated 
> with a 1 (absence is obviously indicated with a 0).
> 
> I would like to randomise the matrices many times in order to construct 
> null models. However, I cannot identify a function in R to do this, and 
> the programming looks tricky for someone of my limited skills.
> 
> Can anybody help me out?

Nick,

For a 1001 x 1001 matrix, this method takes less than 2 seconds on my 2 year old
Windows PC.

ronetab( marg1, marg2 ) returns a table of 0's and 1's according to the marginal
contraints. 

ck.ronetab( marg1, marg2 ) checks that all the constraints were honored.

msample <- function(x,marg)
{
  ## Purpose: sample at most one each from each cell of a margin
  ## ----------------------------------------------------------------------
  ## Arguments: x - number to sample, marg - a vector of integers
  ## ----------------------------------------------------------------------
  ## Author: Charles C. Berry, Date: 28 Apr 2007, 08:17
  ## GPL 2.0 or better

  if (!(x<=sum(marg) && all(marg>=0))) browser()
  wm <- which(marg!=0)
  if (length(wm)==1) {
    wm
  } else {
    sample( seq(along=marg), x, prob=marg )
  }
}

ronetab <- function(m1,m2,debug=F)
{
  ## Purpose: sample from a table with fixed margins and {0,1} cell values
  ## ----------------------------------------------------------------------
  ## Arguments: m1, m2 - two margins
  ## ----------------------------------------------------------------------
  ## Author: Charles C. Berry, Date: 28 Apr 2007, 08:21
  ## GPL 2.0 or better

  stopifnot( sum(m1)==sum(m2)|| max(m1)>length(m2) || max(m2)>length(m1) )

  i.list <- j.list <- list()
  k <- 0
  while( sum(m1)>0 ){
    k <- k+1
    if ( sum(m1!=0) > sum(m2!=0) ){
      i <- which.max( m1)
      j <- msample( m1[i], m2 )
      i.list[[ k ]] <- rep( i, m1[i] )
      j.list[[ k ]] <- j
      m1[i] <- 0
      m2[ j ] <- m2[ j ] - 1
    } else {
      j <- which.max( m2 )
      i <- msample( m2[j], m1 )
      i.list[[ k ]] <- i
      j.list[[ k ]] <- rep( j, m2[j] )
      m2[j] <- 0
      m1[ i ] <- m1[ i ] - 1
    }
  }
  res <- array(0, c(length(m1), length(m2) ) )
  res[ cbind( unlist(i.list), unlist(j.list) ) ] <- 1
  res
}

ck.ronetab <- function(m1,m2){
  tab <- ronetab(m1,m2)
  m1.ck <- all(m1==rowSums(tab))
  m2.ck <- all(m2==colSums(tab))
  cell.ck <- all( tab %in% 0:1 )
  res <- m1.ck & m2.ck & cell.ck
  if (!res) attr(res,"tab") <- tab
  res
}

I'll warn you that I have not worked through what looks to be a tedious proof
that this randomly samples matrices under the constraints. The heuristics seem
right, and a few simulation spot checks look reasonable. If you do not want to
trust it, you can still use it to generate a starting value for an MCMC run.

HTH,

Chuck