[R-sig-Geo] Comparing spatial distributions - permutation test implementation
Marcelino de la Cruz
marcelino.delacruz at upm.es
Thu May 21 11:40:19 CEST 2009
Hi,
Jose M. Blanco-Moreno an myself have implemented
Syrjala's test in the ecespa package. As a matter
of fact, Jose M. has implemented a Frotran
version of Syrjala's original QBasic function.
Using your data the results are very close to
your reported # statistic= 0.224 and # p-value = 0.1900.
Cheers,
Marcelino
# with Fortran code
> syrjala0(dataCod[,1:2], dataCod$var1, dataCod$var2,R=F,nsim=1000)
Cramer-von Misses test for the difference between
the spatial distributions of dataCod$var1 and dataCod$var2
based on 1000 permutations.
psi: 0.223504
p-value: 0.2167832
Kolmogorov-Smirnov test for the difference between
the spatial distributions of dataCod$var1 and dataCod$var2
based on 1000 permutations.
psi: 0.061354
p-value: 0.2867133
### With R-code (a bit lengthy, so I use only 100 permutations)
> syrjala0(dataCod[,1:2], dataCod$var1, dataCod$var2,R=T,nsim=100)
Syrjala test for the difference between the spatial distributions of
dataCod$var1 and dataCod$var2 , based on 100 simulations
psi: 0.223504
p-value: 0.2079208
At 17:34 20/05/2009, JiHO wrote:
>Hello everyone,
>
>I am looking at the joint spatial distribution of 2 kinds of organisms
>(estimated on a grid of points) and want to test for significant
>association or dissociation.
>
>My first question is: do you know a nice technique to do that,
>considering that I have a limited number of points (36) but that they
>are repeated (4 times)? I did GLMs to test for correlations between
>the two (hence forgetting about the spatial aspect of it) and was
>previously pointed to the SADIE software. Would there be anything
>explicitly spatial and available in R please?
>
>Then, Syrjala's test[1] seems appropriate and tests for differences in
>distribution. It computes a Cramér-von Mises-type statistic and tests
>its significance with a permutation test.
>I implemented the test in R and posted the code on these mailing
>lists[2]. Some people checked it and confirmed that the statistic
>gives correct results but my estimation of the p-value does not match
>the one predicted with the orignal software from Syrjala. I don't know
>what I am doing wrong. The permutation test is described by Syrjala as:
>
> (...) Under the null hypothesis,
> at a given sampling location (x_k, y_k), either density ob-
> servation y_i(x_k, y_k), i = 1, 2, is equally likely for each
> population. Thus, for a given data set, the distribution
> of the test statistic can be constructed by calculating
> the value of the test statistic for all 2^k pairwise per-
> mutations of the data set. (...) The level of signif-
> icance of a specific realization of the test statistic T is
> determined from its position in the ordered set of test
> statistic values from all 2^k permutations. (...)
>
>My understanding is that, for each permutation I should choose a
>random number of points (between 1 and k), swap the values for species
>1 and species 2 at those points, and recompute the test on the new
>data. But this does not work :/ . Here is my code and associated data
>from Syrjala (for which I have reference values). Any advice would be
>very welcome (in particular if there is a way to leverage boot() for
>this).
>NB: computing the 1000 permutations can be a bit lengthy, but
>fortunately, by using plyr, you get a nice progress bar to look at!
>
>syrjala.stat <- function(x, y=NULL, var1=NULL, var2=NULL)
>#
># Compute Syrjala statistic
># x, y coordinates
># var1, var2 value of 2 parameters both measured at (x,y) points
># NB: x can also be a data.frame/matrix containing x,y,var1,var2 as
>columns
>#
>{
> # Input checks
> if (!is.null(ncol(x))) {
> if (ncol(x) == 4) {
> names(x) = c("x","y","var1","var2")
> dat = x
> } else {
> stop("Wrong number of columns in argument x")
> }
> } else {
> dat = data.frame(x, y, var1, var2)
> }
>
> # Normalize abundances
> dat$var1 = dat$var1/sum(dat$var1)
> dat$var2 = dat$var2/sum(dat$var2)
>
> # For each point (each line of dat)
> # compute the squared difference in gammas from each origin
> meanSqDiff = apply(dat, 1, function(d, coord, variab) {
> north = (coord$x>=d[1])
> east = (coord$y>=d[2])
> south = (coord$x<=d[1])
> west = (coord$y<=d[2])
> return( mean( c(
> (diff(sapply(variab[(north & east),], sum)))^2,
> (diff(sapply(variab[(south & east),], sum)))^2,
> (diff(sapply(variab[(south & west),], sum)))^2,
> (diff(sapply(variab[(north & west),], sum)))^2
> ) )
> )
> }, dat[,c("x","y")], dat[,c("var1","var2")])
>
> # Compute the statistic (i.e. sum of mean squared differences)
> return(sum(meanSqDiff))
>}
>
>
># Get data online : http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv
>system("curl http://dl.getdropbox.com/u/1047321/syrjala_data_cod.csv >
>syrjala_data_cod.csv")
>
>dataCod = read.csv(file = "syrjala_data_cod.csv", header = TRUE)
>
># Normalize abundances
>dataCod$var1 = dataCod$var1/sum(dataCod$var1)
>dataCod$var2 = dataCod$var2/sum(dataCod$var2)
>
># Number of permutations
>nperm = 1000
>
># Create nperm-1 replicates of the data (one is the original
>observation)
>d = rep(list(dataCod), nperm-1)
>
># Compute number of observations before to avoid doing that for every
>replicate
>n = nrow(dataCod)
>
>require(plyr)
># Permute some observations and compute the syrjala stat for each
>permutation
>psis = ldply(d, .fun=function(x, n){
> # choose indices of observations to swap
> idx = sample(1:n, runif(1, min=1, max=n))
> # swap observations
> x[idx, 3:4] = x[idx, 4:3]
> # compute syrjala stat
> return(syrjala.stat(x))
>}, n, .progress="text")
>}
>
># Compute the syrjala stat for the observations
>psi = syrjala.stat(dataCod)
>
># Estimate the pvalue
>pvalue = (sum(psis>=psi)+1)/nperm
>
>psi
>pvalue
># Should be:
># statistic = 0.224
># p-value = 0.1900
>
>Thank you very much in advance. Sincerely,
>
>JiHO
>---
>http://jo.irisson.free.fr/
>
>[1] A statistical test for a difference between the spatial
>distributions of two populations. Syrjala SE. Ecology. 1996;77(1):7580.
>http://dl.getdropbox.com/u/1047321/Syrjala1996.pdf
>
>[2] https://stat.ethz.ch/pipermail/r-sig-geo/2008-February/ thread.html#3137
>
>_______________________________________________
>R-sig-Geo mailing list
>R-sig-Geo at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/r-sig-geo
________________________________
Marcelino de la Cruz Rot
Departamento de Biología Vegetal
E.U.T.I. Agrícola
Universidad Politécnica de Madrid
28040-Madrid
Tel.: 91 336 54 35
Fax: 91 336 56 56
marcelino.delacruz at upm.es
More information about the R-sig-Geo
mailing list