[R] [R-sig-Geo] Comparing spatial distributions - permutation test implementation

Thu May 21 11:40:19 CEST 2009

Hi,

Jose M. Blanco-Moreno an myself have implemented 
Syrjala's test in the ecespa package. As a matter 
of fact, Jose M. has implemented a Frotran 
version of Syrjala's original QBasic function. 
Using your data the results are very close to 
your reported # statistic= 0.224 and # p-value       = 0.1900.

Cheers,

Marcelino

# with Fortran code
 > syrjala0(dataCod[,1:2], dataCod$var1, dataCod$var2,R=F,nsim=1000)
Cramer-von Misses test for the difference between
the spatial distributions of  dataCod$var1 and dataCod$var2
based on 1000 permutations.

    psi:     0.223504
    p-value: 0.2167832

Kolmogorov-Smirnov test for the difference between
the spatial distributions of  dataCod$var1 and dataCod$var2
based on 1000 permutations.

    psi:     0.061354
    p-value: 0.2867133

### With R-code (a bit lengthy, so I use only 100 permutations)
 > syrjala0(dataCod[,1:2], dataCod$var1, dataCod$var2,R=T,nsim=100)
Syrjala test for the difference between the spatial distributions of
  dataCod$var1 and  dataCod$var2 , based on 100 simulations

    psi:      0.223504
    p-value:  0.2079208

At 17:34 20/05/2009, JiHO wrote:
>Hello everyone,
>
>I am looking at the joint spatial distribution of 2 kinds of organisms
>(estimated on a grid of points) and want to test for significant
>association or dissociation.
>
>My first question is: do you know a nice technique to do that,
>considering that I have a limited number of points (36) but that they
>are repeated (4 times)? I did GLMs to test for correlations between
>the two (hence forgetting about the spatial aspect of it) and was
>previously pointed to the SADIE software. Would there be anything
>explicitly spatial and available in R please?
>
>Then, Syrjala's test[1] seems appropriate and tests for differences in
>distribution. It computes a Cramér-von Mises-type statistic and tests
>its significance with a permutation test.
>I implemented the test in R and posted the code on these mailing
>lists[2]. Some people checked it and confirmed that the statistic
>gives correct results but my estimation of the p-value does not match
>the one predicted with the orignal software from Syrjala. I don't know
>what I am doing wrong. The permutation test is described by Syrjala as:
>
>         (...) Under the null hypothesis,
>         at a given sampling location (x_k, y_k), either density ob-
>         servation y_i(x_k, y_k), i = 1, 2, is equally likely for each
>         population. Thus, for a given data set, the distribution
>         of the test statistic can be constructed by calculating
>         the value of the test statistic for all 2^k pairwise per-
>         mutations of the data set. (...) The level of signif-
>         icance of a specific realization of the test statistic T is
>         determined from its position in the ordered set of test
>         statistic values from all 2^k permutations. (...)
>
>My understanding is that, for each permutation I should choose a
>random number of points (between 1 and k), swap the values for species
>1 and species 2 at those points, and recompute the test on the new
>data. But this does not work :/ . Here is my code and associated data
>from Syrjala (for which I have reference values). Any advice would be
>very welcome (in particular if there is a way to leverage boot() for
>this).
>NB: computing the 1000 permutations can be a bit lengthy, but
>fortunately, by using plyr, you get a nice progress bar to look at!
>
>syrjala.stat <- function(x, y=NULL, var1=NULL, var2=NULL)
>#
>#       Compute Syrjala statistic
>#       x, y                    coordinates
>#       var1, var2      value of 2 parameters both measured at (x,y) points
>#       NB: x can also be a data.frame/matrix containing x,y,var1,var2 as
>columns
>#
>{
>         # Input checks
>         if (!is.null(ncol(x))) {
>                 if (ncol(x) == 4) {
>                         names(x) = c("x","y","var1","var2")
>                         dat = x
>                 } else {
>                         stop("Wrong number of columns in argument x")
>                 }
>         } else {
>                 dat = data.frame(x, y, var1, var2)
>         }
>
>         # Normalize abundances
>         dat$var1 = dat$var1/sum(dat$var1)
>         dat$var2 = dat$var2/sum(dat$var2)
>
>         # For each point (each line of dat)
>         # compute the squared difference in gammas from each origin
>         meanSqDiff = apply(dat, 1, function(d, coord, variab) {
>                 north = (coord$x>=d[1])
>                 east = (coord$y>=d[2])
>                 south = (coord$x<=d[1])
>                 west = (coord$y<=d[2])
>                 return( mean( c(
>                         (diff(sapply(variab[(north & east),], sum)))^2,
>                         (diff(sapply(variab[(south & east),], sum)))^2,
>                         (diff(sapply(variab[(south & west),], sum)))^2,
>                         (diff(sapply(variab[(north & west),], sum)))^2
>                         ) )
>                 )
>         }, dat[,c("x","y")], dat[,c("var1","var2")])
>
>         # Compute the statistic (i.e. sum of mean squared differences)
>         return(sum(meanSqDiff))
>}
>
>
># Get data online : http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv
>system("curl http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv >
>syrjala_data_cod.csv")
>
>dataCod = read.csv(file = "syrjala_data_cod.csv", header = TRUE)
>
># Normalize abundances
>dataCod$var1 = dataCod$var1/sum(dataCod$var1)
>dataCod$var2 = dataCod$var2/sum(dataCod$var2)
>
># Number of permutations
>nperm = 1000
>
># Create nperm-1 replicates of the data (one is the original
>observation)
>d = rep(list(dataCod), nperm-1)
>
># Compute number of observations before to avoid doing that for every
>replicate
>n = nrow(dataCod)
>
>require(plyr)
># Permute some observations and compute the syrjala stat for each
>permutation
>psis = ldply(d, .fun=function(x, n){
>         # choose indices of observations to swap
>         idx = sample(1:n, runif(1, min=1, max=n))
>         # swap observations
>         x[idx, 3:4] = x[idx, 4:3]
>         # compute syrjala stat
>         return(syrjala.stat(x))
>}, n, .progress="text")
>}
>
># Compute the syrjala stat for the observations
>psi = syrjala.stat(dataCod)
>
># Estimate the pvalue
>pvalue = (sum(psis>=psi)+1)/nperm
>
>psi
>pvalue
># Should be:
># statistic     = 0.224
># p-value       = 0.1900
>
>Thank you very much in advance. Sincerely,
>
>JiHO
>---
>http://jo.irisson.free.fr/
>
>[1] A statistical test for a difference between the spatial
>distributions of two populations. Syrjala SE. Ecology. 1996;77(1):7580.
>http://dl.getdropbox.com/u/1047321/Syrjala1996.pdf
>
>[2] https://stat.ethz.ch/pipermail/r-sig-geo/2008-February/ thread.html#3137
>
>_______________________________________________
>R-sig-Geo mailing list
>R-sig-Geo at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/r-sig-geo

________________________________

Marcelino de la Cruz Rot

Departamento de  Biología Vegetal
E.U.T.I. Agrícola
Universidad Politécnica de Madrid
28040-Madrid
Tel.: 91 336 54 35
Fax: 91 336 56 56
marcelino.delacruz at upm.es