[R] random sampling with some limitive conditions?
Gavin Simpson
gavin.simpson at ucl.ac.uk
Mon Jul 9 10:05:45 CEST 2007
Dear Jian,
This came up on R-help recently; I haven't got time to find the thread
for you, but it shouldn't take too much hunting out.
I suggested the following approach, based on the swap method of Roberts
and Stone (1990) Island-sharing by archipelago species. Oecologia 85:
560-567. The code for this is appended at the end of this message. Below
is an example:
> dat <- matrix(scan(), ncol = 8, byrow = TRUE)
1: 1 0 0 0 1 1 0 0
9: 0 1 1 1 0 1 0 1
17: 1 0 0 0 1 0 1 0
25: 0 0 0 1 0 1 0 1
33: 1 0 1 0 0 0 0 0
41: 0 1 0 1 1 1 1 1
49: 1 0 0 0 0 0 0 0
57: 0 0 0 1 0 1 0 1
65:
Read 64 items
> dat ## your data
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 1 1 0 0
[2,] 0 1 1 1 0 1 0 1
[3,] 1 0 0 0 1 0 1 0
[4,] 0 0 0 1 0 1 0 1
[5,] 1 0 1 0 0 0 0 0
[6,] 0 1 0 1 1 1 1 1
[7,] 1 0 0 0 0 0 0 0
[8,] 0 0 0 1 0 1 0 1
rBinMat is the function appended below. We search for elements of dat
that that are diagonal:
10 or 01
01 10
We do this by randomly selecting two rows and columns to find a 2x2
sub-matrix of dat. We check if it is diagonal, and if it is, we swap the
1's and 0's. This retains the row and column sums. If the sub-matrix is
non-diagonal, we simply leave it as it is. This process is repeated many
times in a "burn in" period, after which, random binary matrices can be
produced.
In the line below we use a burn in of 30000 swaps - which takes about 9
seconds to prime the matrix. (in fact this matrix is achieved after
making a further 1000 swaps as per argument skip = 1000)
> system.time(ran.mat1 <- rBinMat(dat, burn.in = 30000))
user system elapsed
9.556 0.104 9.683
> ran.mat1
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 0 1 0 0 1 0 0 1
[2,] 0 1 1 0 1 1 1 0
[3,] 1 0 1 1 0 0 0 0
[4,] 0 0 0 1 0 1 0 1
[5,] 1 0 0 1 0 0 0 0
[6,] 1 0 0 1 1 1 1 1
[7,] 0 0 0 0 0 1 0 0
[8,] 1 0 0 0 0 1 0 1
> rowSums(ran.mat1)
[1] 3 5 3 3 2 6 1 3
> colSums(ran.mat1)
[1] 4 2 2 4 3 5 2 4
Having primed our random matrix, we can use this matrix and produce a
new random matrix from it by making a smaller number of swaps (default
1000), by setting the burn in to be 0.
> ran.mat2 <- rBinMat(ran.mat1, burn.in = 0)
> ran.mat2
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 0 0 0 1 1 0 0
[2,] 1 0 1 1 0 1 0 1
[3,] 1 0 1 0 0 0 1 0
[4,] 0 0 0 1 1 0 0 1
[5,] 0 1 0 0 0 0 0 1
[6,] 0 1 0 1 1 1 1 1
[7,] 0 0 0 0 0 1 0 0
[8,] 1 0 0 1 0 1 0 0
> rowSums(ran.mat2)
[1] 3 5 3 3 2 6 1 3
> colSums(ran.mat2)
[1] 4 2 2 4 3 5 2 4
>
As you can see, the row and column sums are as per your original matrix.
There are other methods to achieve these random binary matrices - some
of which were mentioned in the recent thread.
HTH
G
Ps: I cooked this up over lunch in response to the earlier thread. It
could no doubt be improved upon.
#######################################################################
## ##
## rBinMat() ##
## ##
## Function: ##
## ========= ##
## Generates random binary matrices that preserve row and column ##
## sums using the swap method of Roberts and Stone (1990) Island- ##
## sharing by archipelago species. Oecologia 85: 560-567. ##
## ##
## Arguments: ##
## ========== ##
## x : Species presence/absence matrix (binary, 1/0s) ##
## burn.in : The "burn in" period required --- how many swaps to ##
## make to 'x' before a random matrix can be generated? ##
## skip : The number of swaps to skip before drawing a matrix. ##
## Once a matrix is burned in, it can be used in sub- ##
## sequent calls to rBinMat() [using "burn.in = 0"] for ##
## generating further random matrices. In such cases, one ##
## might wish to skip n swaps before drawing the next ##
## matrix. ##
## ##
## Author : Gavin L. Simpson ##
## Date : 27 April 2007 ##
## Licence : GPL Version 2 ##
## ##
## Copyright (c) Gavin L. Simpson, 2007 ##
## ##
## History : 27-April-2007 - Created ##
## ##
#######################################################################
rBinMat <- function(x, burn.in = 10000, skip = 1000) {
dim.nams <- dimnames(x)
x <- as.matrix(x)
dimnames(x) <- NULL
## number rows/cols
n.col <- ncol(x)
n.row <- nrow(x)
## is the 2x2 matrix diagonal or anti-diagonal
isDiag <- function(x) {
X <- as.vector(x)
if(.Internal(identical(X, c(1,0,0,1))))
return(TRUE)
else if(.Internal(identical(X, c(0,1,1,0))))
return(TRUE)
else
return(FALSE)
}
changed <- 0
## do the burn in changes, then skip, then a final swap,
## this is the first random matrix
while(changed <= (burn.in + skip + 1)) {
ran.row <- .Internal(sample(n.row, 2, replace = FALSE, prob = NULL))
ran.col <- .Internal(sample(n.col, 2, replace = FALSE, prob = NULL))
if(isDiag(x[ran.row, ran.col])) {
x[ran.row, ran.col] <- c(1,0)[x[ran.row, ran.col] + 1]
changed <- changed + 1}
}
dimnames(x) <- dim.nams
return(x)
}
On Sun, 2007-07-08 at 11:12 -0600, Zhang Jian wrote:
> It is not right. My data is the presence-absence data. And I want to get
> thousands of presence-absence random data which length of rows and columns
> is the same with the former data. Meantime, the new data needs to have the
> fixed sums for each row and column with the former data.
> For example:
> The data "sites":
> site1 site2 site3 site4 site5 site6 site7 site8
> 1 0 0 0 1 1 0 0
> 0 1 1 1 0 1 0 1
> 1 0 0 0 1 0 1 0
> 0 0 0 1 0 1 0 1
> 1 0 1 0 0 0 0 0
> 0 1 0 1 1 1 1 1
> 1 0 0 0 0 0 0 0
> 0 0 0 1 0 1 0 1
> > apply(sites,2,sum)
> site1 site2 site3 site4 site5 site6 site7 site8
> 4 2 2 4 3 5 2 4
> > apply(sites,1,sum)
> [1] 3 5 3 3 2 6 1 3
>
> If I get the new data "sites.random":
> site1 site2 site3 site4 site5 site6 site7 site8
> 1 0 0 0 1 1 0 0
> 1 1 1 1 0 1 0 0
> 1 0 0 0 1 0 1 0
> 0 0 0 1 0 1 0 1
> 0 0 1 0 0 0 0 1
> 0 1 0 1 1 1 1 1
> 1 0 0 0 0 0 0 0
> 0 0 0 1 0 1 0 1
> > apply(sites.random,2,sum) # the same with the former data
> site1 site2 site3 site4 site5 site6 site7 site8
> 4 2 2 4 3 5 2 4
> > apply(sites.random,1,sum) # the same with the former data
> [1] 3 5 3 3 2 6 1 3
>
> How can I get the new random data? Thanks.
>
>
>
> On 7/8/07, Alan Zaslavsky <zaslavsk at hcp.med.harvard.edu> wrote:
> >
> > If I understand your problem, this might be a solution. Assign
> > independent random numbers for row and column and use the corresponding
> > ordering to assign the row and column indices. Thus row and column
> > assignments are independent and the row and column totals are fixed. If
> > cc and rr are respectively the desired row and column totals, with
> > sum(cc)==sum(rr), then
> >
> > n = sum(cc)
> > row.assign = rep(1:length(rr),rr)[order(runif(n))]
> > col.assign = rep(1:length(cc),cc)[order(runif(n))]
> >
> > If you want many such sets of random assignments to be generated at once
> > you can use a few more rep() calls in the expressions to generate multiple
> > sets in the same way. (Do you actually want the assignments or just the
> > tables?) Of course there are many other possible solutions since you have
> > not fully specified the distribution you want.
> >
> > Alan Zaslavsky
> > Harvard U
> >
> > > From: "Zhang Jian" <jzhang1982 at gmail.com>
> > > Subject: [R] random sampling with some limitive conditions?
> > > To: r-help <r-help at stat.math.ethz.ch>
> > >
> > > I want to gain thousands of random sampling data by randomizing the
> > > presence-absence data. Meantime, one important limition is that the row
> > and
> > > column sums must be fixed. For example, the data "tst" is following:
> > > site1 site2 site3 site4 site5 site6 site7 site8 1 0 0 0 1 1 0 0 0 1 1
> > 1 0
> > > 1 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1
> > 0 0
> > > 0 0 0 0 0 0 0 0 1 0 1 0 1
> > >
> > > sum(tst[1,]) = 3, sum(tst[,1])=4, and so on. When I randomize the data,
> > the
> > > first row sums must equal to 3, and the first column sums must equal to
> > 4.
> > > The rules need to be applied to each row and column.
> > > How to get the new random sampling data? I have no idea.
> > > Thanks.
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK [w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list