[R] random sampling with some limitive conditions?

Gavin Simpson gavin.simpson at ucl.ac.uk
Mon Jul 9 10:05:45 CEST 2007


Dear Jian,

This came up on R-help recently; I haven't got time to find the thread
for you, but it shouldn't take too much hunting out.

I suggested the following approach, based on the swap method of Roberts
and Stone (1990) Island-sharing by archipelago species. Oecologia 85:
560-567. The code for this is appended at the end of this message. Below
is an example:

> dat <- matrix(scan(), ncol = 8, byrow = TRUE)
1: 1 0 0 0 1 1 0 0
9: 0 1 1 1 0 1 0 1
17: 1 0 0 0 1 0 1 0
25: 0 0 0 1 0 1 0 1
33: 1 0 1 0 0 0 0 0
41: 0 1 0 1 1 1 1 1
49: 1 0 0 0 0 0 0 0
57: 0 0 0 1 0 1 0 1
65:
Read 64 items
> dat ## your data
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    0    0    0    1    1    0    0
[2,]    0    1    1    1    0    1    0    1
[3,]    1    0    0    0    1    0    1    0
[4,]    0    0    0    1    0    1    0    1
[5,]    1    0    1    0    0    0    0    0
[6,]    0    1    0    1    1    1    1    1
[7,]    1    0    0    0    0    0    0    0
[8,]    0    0    0    1    0    1    0    1

rBinMat is the function appended below. We search for elements of dat
that that are diagonal:

10  or 01
01     10

We do this by randomly selecting two rows and columns to find a 2x2
sub-matrix of dat. We check if it is diagonal, and if it is, we swap the
1's and 0's. This retains the row and column sums. If the sub-matrix is
non-diagonal, we simply leave it as it is. This process is repeated many
times in a "burn in" period, after which, random binary matrices can be
produced.

In the line below we use a burn in of 30000 swaps - which takes about 9
seconds to prime the matrix. (in fact this matrix is achieved after
making a further 1000 swaps as per argument skip = 1000)

> system.time(ran.mat1 <- rBinMat(dat, burn.in = 30000))
   user  system elapsed
  9.556   0.104   9.683
> ran.mat1
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    1    0    0    1    0    0    1
[2,]    0    1    1    0    1    1    1    0
[3,]    1    0    1    1    0    0    0    0
[4,]    0    0    0    1    0    1    0    1
[5,]    1    0    0    1    0    0    0    0
[6,]    1    0    0    1    1    1    1    1
[7,]    0    0    0    0    0    1    0    0
[8,]    1    0    0    0    0    1    0    1
> rowSums(ran.mat1)
[1] 3 5 3 3 2 6 1 3
> colSums(ran.mat1)
[1] 4 2 2 4 3 5 2 4

Having primed our random matrix, we can use this matrix and produce a
new random matrix from it by making a smaller number of swaps (default
1000), by setting the burn in to be 0.

> ran.mat2 <- rBinMat(ran.mat1, burn.in = 0)
> ran.mat2
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    0    0    0    1    1    0    0
[2,]    1    0    1    1    0    1    0    1
[3,]    1    0    1    0    0    0    1    0
[4,]    0    0    0    1    1    0    0    1
[5,]    0    1    0    0    0    0    0    1
[6,]    0    1    0    1    1    1    1    1
[7,]    0    0    0    0    0    1    0    0
[8,]    1    0    0    1    0    1    0    0
> rowSums(ran.mat2)
[1] 3 5 3 3 2 6 1 3
> colSums(ran.mat2)
[1] 4 2 2 4 3 5 2 4
>

As you can see, the row and column sums are as per your original matrix.

There are other methods to achieve these random binary matrices - some
of which were mentioned in the recent thread.

HTH

G

Ps: I cooked this up over lunch in response to the earlier thread. It
could no doubt be improved upon.

#######################################################################
##                                                                   ##
## rBinMat()                                                         ##
##                                                                   ##
## Function:                                                         ##
## =========                                                         ##
## Generates random binary matrices that preserve row and column     ##
## sums using the swap method of Roberts and Stone (1990) Island-    ##
## sharing by archipelago species. Oecologia 85: 560-567.            ##
##                                                                   ##
## Arguments:                                                        ##
## ==========                                                        ##
## x       : Species presence/absence matrix (binary, 1/0s)          ##
## burn.in : The "burn in" period required --- how many swaps to     ##
##           make to 'x' before a random matrix can be generated?    ##
## skip    : The number of swaps to skip before drawing a matrix.    ##
##           Once a matrix is burned in, it can be used in sub-      ##
##           sequent calls to rBinMat() [using "burn.in = 0"] for    ##
##           generating further random matrices. In such cases, one  ##
##           might wish to skip n swaps before drawing the next      ##
##           matrix.                                                 ##
##                                                                   ##
## Author  : Gavin L. Simpson                                        ##
## Date    : 27 April 2007                                           ##
## Licence : GPL Version 2                                           ##
##                                                                   ##
## Copyright (c) Gavin L. Simpson, 2007                              ##
##                                                                   ##
## History : 27-April-2007 - Created                                 ##
##                                                                   ##
#######################################################################
rBinMat <- function(x, burn.in = 10000, skip = 1000) {
  dim.nams <- dimnames(x)
  x <- as.matrix(x)
  dimnames(x) <- NULL
  ## number rows/cols
  n.col <- ncol(x)
  n.row <- nrow(x)
  ## is the 2x2 matrix diagonal or anti-diagonal
  isDiag <- function(x) {
    X <- as.vector(x)
    if(.Internal(identical(X, c(1,0,0,1))))
      return(TRUE)
    else if(.Internal(identical(X, c(0,1,1,0))))
      return(TRUE)
    else
      return(FALSE)
  }
  changed <- 0
  ## do the burn in changes, then skip, then a final swap,
  ## this is the first random matrix
  while(changed <= (burn.in + skip + 1)) {
    ran.row <- .Internal(sample(n.row, 2, replace = FALSE, prob = NULL))
    ran.col <- .Internal(sample(n.col, 2, replace = FALSE, prob = NULL))
    if(isDiag(x[ran.row, ran.col])) {
      x[ran.row, ran.col] <- c(1,0)[x[ran.row, ran.col] + 1]
      changed <- changed + 1}
  }
  dimnames(x) <- dim.nams
  return(x)
}


On Sun, 2007-07-08 at 11:12 -0600, Zhang Jian wrote:
> It is not right. My data is the presence-absence data. And I want to get
> thousands of presence-absence random data which length of rows and columns
> is the same with the former data. Meantime, the new data needs to have the
> fixed sums for each row and column with the former data.
> For example:
> The data "sites":
> site1 site2 site3 site4 site5 site6 site7 site8
> 1 0 0 0 1 1 0 0
> 0 1 1 1 0 1 0 1
> 1 0 0 0 1 0 1 0
> 0 0 0 1 0 1 0 1
> 1 0 1 0 0 0 0 0
> 0 1 0 1 1 1 1 1
> 1 0 0 0 0 0 0 0
> 0 0 0 1 0 1 0 1
> > apply(sites,2,sum)
> site1 site2 site3 site4 site5 site6 site7 site8
>     4     2     2     4     3     5     2     4
> > apply(sites,1,sum)
> [1] 3 5 3 3 2 6 1 3
> 
> If I get the new data "sites.random":
> site1 site2 site3 site4 site5 site6 site7 site8
> 1 0 0 0 1 1 0 0
> 1 1 1 1 0 1 0 0
> 1 0 0 0 1 0 1 0
> 0 0 0 1 0 1 0 1
> 0 0 1 0 0 0 0 1
> 0 1 0 1 1 1 1 1
> 1 0 0 0 0 0 0 0
> 0 0 0 1 0 1 0 1
> > apply(sites.random,2,sum) # the same with the former data
> site1 site2 site3 site4 site5 site6 site7 site8
>     4     2     2     4     3     5     2     4
> > apply(sites.random,1,sum) # the same with the former data
> [1] 3 5 3 3 2 6 1 3
> 
> How can I get the new random data? Thanks.
> 
> 
> 
> On 7/8/07, Alan Zaslavsky <zaslavsk at hcp.med.harvard.edu> wrote:
> >
> > If I understand your problem, this might be a solution.  Assign
> > independent random numbers for row and column and use the corresponding
> > ordering to assign the row and column indices.  Thus row and column
> > assignments are independent and the row and column totals are fixed.  If
> > cc and rr are respectively the desired row and column totals, with
> > sum(cc)==sum(rr), then
> >
> > n = sum(cc)
> > row.assign = rep(1:length(rr),rr)[order(runif(n))]
> > col.assign = rep(1:length(cc),cc)[order(runif(n))]
> >
> > If you want many such sets of random assignments to be generated at once
> > you can use a few more rep() calls in the expressions to generate multiple
> > sets in the same way.  (Do you actually want the assignments or just the
> > tables?) Of course there are many other possible solutions since you have
> > not fully specified the distribution you want.
> >
> >        Alan Zaslavsky
> >        Harvard U
> >
> > > From: "Zhang Jian" <jzhang1982 at gmail.com>
> > > Subject: [R] random sampling with some limitive conditions?
> > > To: r-help <r-help at stat.math.ethz.ch>
> > >
> > > I want to gain thousands of random sampling data by randomizing the
> > > presence-absence data. Meantime, one important limition is that the row
> > and
> > > column sums must be fixed. For example, the data "tst" is following:
> > >    site1 site2 site3 site4 site5 site6 site7 site8 1 0 0 0 1 1 0 0 0 1 1
> > 1 0
> > > 1 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1
> > 0 0
> > > 0 0 0 0 0 0 0 0 1 0 1 0 1
> > >
> > > sum(tst[1,]) = 3, sum(tst[,1])=4, and so on. When I randomize the data,
> > the
> > > first row sums must equal to 3, and the first column sums must equal to
> > 4.
> > > The rules need to be applied to each row and column.
> > > How to get the new random sampling data? I have no idea.
> > > Thanks.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [t] +44 (0)20 7679 0522
ECRC                              [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building                  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK                        [w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT                          [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list