[R] Randomising matrices
Gavin Simpson
gavin.simpson at ucl.ac.uk
Fri Apr 27 15:19:45 CEST 2007
On Fri, 2007-04-27 at 09:45 +0100, Nick Cutler wrote:
Hi Nick
(Been meaning to reply to your private email to me but I've been away on
vacation and at meeting for several weeks)
> I would like to be able to randomise presence-absence (i.e. binary)
> matrices whilst keeping both the row and column totals constant. Is
> there a function in R that would allow me to do this?
I thought r2dtable() would suffice, but this doesn't return binary
matrices.
There appears to be a lot of literature on this - Zaman and Simberloff
(2002, Environmental and Ecological Statistics 9, 405--421) discuss many
previous attempts to do this and present another approach.
I've been interested in this for a little while so cooked up one of
their reviewed methods this morning. It works by choosing at random 2
rows and 2 columns of your matrix, leading to a 2x2 sub matrix of your
original matrix.
If this matrix is:
1 0
0 1
or
0 1
1 0
then you can swap the 0s and 1s and you haven't altered the row or
column sums any. You do this swap many times as a "burn in" period, and
then you can sample a random matrix by making one further swap.
The problem with this method is that it might not faithfully represent
the full universe of possible matrices (with row column constraints), in
that you might end up sampling only a string of matrices from a small
region of all possible matrices. One way to get round this is that
following the burn in, you then take a matrix only after the k+1th swap
- i.e. you make k swaps and then draw the matrix, rather than draw after
each swap. Whether you need the skip is debatable (Manly 1995, Ecology
76, 1109-1115).
The appended rBinMat() function implements this method (it is after
Roberts and Stone, 1990, Oecologia 83, 560--567), using a burn in period
of 1000 and a skip step of 100 as defaults that you can change. I have
no idea if these default are sufficient as I've not read
An example usage is:
> set.seed(1234)
> dat <- matrix(sample(c(0,1), 100, replace = TRUE), ncol = 10)
> system.time(ran.mat <- rBinMat(dat))
user system elapsed
0.923 0.002 0.953
> ran.mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 0 0 1 0 1 0 0 1 0
[2,] 1 0 0 1 1 1 0 0 1 0
[3,] 1 0 0 0 0 0 1 0 0 0
[4,] 1 1 1 0 1 0 1 1 0 1
[5,] 0 0 0 0 1 0 0 0 0 0
[6,] 1 1 1 1 1 1 1 1 1 1
[7,] 0 1 0 0 1 0 0 0 0 0
[8,] 1 0 1 0 0 1 0 0 0 0
[9,] 0 1 1 0 0 0 0 1 0 0
[10,] 1 0 0 1 1 1 1 1 1 1
> identical(rowSums(dat), rowSums(ran.mat))
[1] TRUE
> identical(colSums(dat), colSums(ran.mat))
[1] TRUE
The size of the matrix is not an issue, but rather the burn in required.
With larger matrices you need to do longer burn in, and make larger
skips perhaps. The 56 species by 28 islands example in Roberts and Stone
(1990), using a burn in of 100 000 (as per their paper), 56 seconds is
needed to get the first matrix and then 0.5 seconds to do the
recommended 1000 skips to get the next matrix. Subsequent matrices take
0.5 seconds to generate on my machine (see below).
rBinMat() returns a single matrix, so if you want to draw n random
matrices you need to run the function n times. But this is wasteful if
you do the burn in each time. So, generate the first matrix as such:
dat <- matrix(sample(c(0,1), 100, replace = TRUE), ncol = 10)
mat <- rBinMat(dat, burn.in = 1000, skip = 100)
then in your loop to generate stats on the NULL models, start from mat
and set burn.in to 0, e.g.
nit <- 1000
for(i in 1:nit) {
mat2 <- rBinMat(mat, burn.in = 0, skip = 100)
### other stats here on the null model
}
Note that there are other ways to do this and the paper Stéphane Dray
pointed you to plus the Zaman & Simberloff one I cite above look at
these in more detail.
HTH
G
Here is the function - some of it is a bit clunky, and can surely be
improved.
rBinMat <- function(x, burn.in = 10000, skip = 1000) {
## number rows/cols
n.col <- ncol(x)
n.row <- nrow(x)
## function to draw at random 2 rows and colums
## just returns the indices required
randDraw <- function(x, nr, nc) {
ran.row <- sample(nr, 2)
ran.col <- sample(nc, 2)
return(c(ran.row, ran.col))
}
## is the 2x2 matrix diagonal or anti-diagonal
isDiag <- function(x) {
X <- as.vector(x)
Diag <- aDiag <- FALSE
if(identical(X, c(1,0,0,1)))
return(TRUE)
else if(identical(X, c(0,1,1,0)))
return(TRUE)
else
return(FALSE)
}
changed <- 0
## do the burn in changes, then skip, then an extra change,
## this is then the first random matrix we want to draw
while(changed <= (burn.in + skip + 1)) {
want <- randDraw(x, n.row, n.col)
X <- x[want[1:2], want[3:4]]
if(isDiag(X)) {
x[want[1:2], want[3:4]] <- c(1,0)[X + 1]
changed <- changed + 1}
}
return(x)
}
>
> I'm working with vegetation presence-absence matrices based on field
> observations. The matrices are formatted to have sites as rows and
> species as columns. The presence of a species on a site is indicated
> with a 1 (absence is obviously indicated with a 0).
>
> I would like to randomise the matrices many times in order to construct
> null models. However, I cannot identify a function in R to do this, and
> the programming looks tricky for someone of my limited skills.
>
> Can anybody help me out?
>
> Many thanks,
>
> Nick Cutler
>
> Institute of Geography
> School of Geosciences
> University of Edinburgh
> Drummond Street
> Edinburgh EH8 9XP
> United Kingdom
>
> Tel: 0131 650 2532
> Web: http://www.geos.ed.ac.uk/homes/s0455078
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list