[R] Permutation of a distance matrix

Dave Roberts droberts at montana.edu
Wed Nov 28 00:40:52 CET 2007


Andy,

    As you have noted, there are issues related to looping in R.  There 
are a couple of possible solutions.

1) code the permutation routine in FORTRAN or C and only call it once. 
If you don't know either of those languages then this won't help.

2) avoid recalculating the raw distances and simple permute the existing 
matrix a suitable number of times.  E.g.

 > library(labdsv)
 > dis.bc <- dsvdis(bryceveg,'bray') # bray/curtis dissimilarity matrix
 > dis.mat <- as.matrix(dis.bc)
 > size <- nrow(dis.mat)
 > for (i in 1:999) {
 >     tmp <- dis.mat[sample(1:size,size,replace=FALSE),] # permute rows
 >     tmp <- tmp[,sample(1:size,size,replace=FALSE)] # permute columns
 >     # calculate mantel or whatever
 >  }

This still requires looping, but avoids the call to ecodist to 
continually recalculate distances that you already know.  Since sample()
is optimized R code, even in a loop it's pretty fast.  By permuting rows 
first, and then columns in the same loop you avoid nested loops which is 
really slow.  On my fairly old PC the above code took a few seconds, and 
dis.mat is 160x160.

Dave Roberts

Andrew Park wrote:
> 
> Hi there,
> 
> I would like to find a more efficient way of permuting the rows and columns of a symmetrical matrix that represents ecological or actual distances between objects in space.  The permutation is of the type used in a Mantel test.
> 
> Specifically, the permutation has to accomplish something like this:
> 
> 
> Original matrix addresses:
> 
> a11   a12   a13
> 
> a21   a22   a23
> 
> a31   a32   a33
> 
> 
> Example permutation
> 
> a22   a23   a21
> 
> a32   a33   a31
> 
> a12   a13   a11
> 
> that is relative positions of rows and columns are conserved in the permutation.
> 
> Basically, I have been doing this in a "for" loop by (1) permuting the raw data vector using "sample", (2) generating a lower triangular distance matrix from the permuted raw data using the "distance" function from "ecodist', and (3) calculating a bunch of statistics including the Mantel correlation and multiple regression statistics, which are then stored in blank matrices that were declared prior to beginning the loop.  The whole procedure needs to repeat at least 999 times but 1999 times would be better and 9999 times would be ideal.
> 
> The problem is, R-users will know, is that using "for" loops like this is slow, and gets slower the further into the loop you get.
> 
> However, I am not a sophisticated programmer, and cannot think of a more efficient way to do this.
> 
> Thanks in advance,
> 
> Andy Park (University of Winnipeg).
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 


-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts                                     office 406-994-4548
Professor and Head                                      FAX 406-994-3190
Department of Ecology                         email droberts at montana.edu
Montana State University
Bozeman, MT 59717-3460



More information about the R-help mailing list