[R-sig-Geo] Beginner's question on choosing the correct test

Roger Bivand Roger.Bivand at nhh.no
Wed Aug 20 22:02:16 CEST 2014


On Wed, 20 Aug 2014, Tim Richter-Heitmann wrote:

> Hi there,
>
> i am new to the spatial statistics, so please bear with me.
>
> My dataset consists of 60 plots, semi-randomly distributed on a 10x6m
> area. We measured species data on  6 sampling dates, so i ended up with
> six different sample by species matrices.
> My first task should be to evaluate the spatial autocorrelation for each
> of the species we have found. I am also going to do variogramming and
> kriging based on the moran I results (is that a feasible approach? or
> are correlograms and variograms redundant? - i would like to have a
> single number to decide if each of my species is SACed or not).
>
> I got very basic R code running from the spdep package:
>
> #my species data
> data <- read.table("species.txt", header = TRUE, sep = "\t", dec = ".")
> #x,y coordinates
> apr.D <- read.table("xy_april.txt", row.names=1, header = TRUE, sep =
> "\t", dec = ",")
> #april only (only 59 plots!)
> ap<-data[1:59,]
>
> library(spdep)
> nb <- tri2nb(apr.D)
> list <- nb2listw(nb)
> moran.test(ap$Ac2, list)
> moran.mc(ap$Ac2, list, nsim=999)
>
> For now, i have omitted every single option spdep is giving me.
> Everything is on default.
> Do you have any suggestions that really should be done during the
> process (for example, should the neighbor list made differently?). As
> this particular species ("Ac2") is normally distributed, i end up with
> the same results for the moran statistic.
> Another question would be, if all attempts of transformation fail to
> normalize a data series, can i even work with moran and variograms at
> all for this particular data series?
>
> The problem is, i tried also another package "ape".
> #create an inverse distance matrix (as suggested from some internet site)
> apr.Dis <- as.matrix(dist(apr.D))
> apr.Dis.Inv <- 1/as.matrix(dist(apr.D))
> diag(apr.Dis.Inv) <- 0
> library(ape)
> Moran.I(ap$Ac2, apr.Dis.Inv)
>
> And i get a different test statistic:
>
> _Output spdep_
>
> Moran's I test under randomisation
>
> data:  ap[, 1]
> weights: list
>
> Moran I statistic standard deviate = 11.7323, p-value < 2.2e-16
> alternative hypothesis: greater
> sample estimates:
> Moran I statistic       Expectation          Variance
>        0.89241144       -0.01724138        0.00601157
>
>
>
> _Output ape:_
>
> $observed
> [1] -0.003425159
>
> $expected
> [1] -0.01724138
>
> $sd
> [1] 0.02363168
>
> $p.value
> [1] 0.5587843
>

Using the example in the ape vignette:

body <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.469)
longevity <- c(4.74493, 3.3322, 3.3673, 2.89037, 2.302)
names(body) <- names(longevity) <- c("Homo", "Pongo", "Macaca", "Ateles",
  "Gala")
library(ape)
trnwk <- "((((Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.6)"
trnwk[2] <- ":0.38,Galago:1.00);"
tr <- read.tree(text = trnwk)
w <- 1/cophenetic(tr)
diag(w) <- 0
unlist(Moran.I(body, w, alternative="greater"))
library(spdep)
moran.test(body, mat2listw(w, style="W"), alternative="greater",
  randomisation=TRUE)

are the same. The data set is too restricted to show the consequences of 
changing the spatial weights - as you see, the consequences are not 
infrequently large, depending on the assumed underlying spatial process. 
In one case you are using sparse graph neighbours, in the other inverse 
distances, which reflect very different choices with reference to the 
assumed underlying spatial process. In general parsimonious (sparse) 
representations are preferable to dense representations (inverse 
distances).

data(eire)
moran.test(eire.df$OWNCONS, nb2listw(eire.nb, style="W"),
  randomisation=TRUE,  alternative="greater")
unlist(Moran.I(eire.df$OWNCONS, nb2mat(eire.nb, style="B"),
  alternative="greater"))

are the same, as are:

crds <- do.call("cbind", eire.coords.utm)
tnb <- tri2nb(crds)
moran.test(eire.df$OWNCONS, nb2listw(tnb, style="W"),
  randomisation=TRUE,  alternative="greater")
unlist(Moran.I(eire.df$OWNCONS, nb2mat(tnb, style="B"),
  alternative="greater"))

and:

t(apply(crds, 2, range))
dnb <- dnearneigh(crds, 0, 350)
dnb
distnb <- nbdists(dnb, crds)
idw <- lapply(distnb, function(x) 1/x)
moran.test(eire.df$OWNCONS, nb2listw(dnb, glist=idw, style="W"),
  randomisation=TRUE,  alternative="greater")
unlist(Moran.I(eire.df$OWNCONS, nb2mat(dnb, glist=idw, style="B"),
  alternative="greater"))

are effectively the same too. However, each variant of the spatial weights 
has different results.

Hope this clarifies,

Roger

>
>
> I understand that both coordinate matrices seem to be different, but as
> a beginner i have very hard times to decide what is wrong or right.
> Curiously, the value for expected is the same, so i guess the calculus
> is correct, but maybe i am not aware of different approaches of the two
> packages? Either way, sdpep makes me reject the null
> (alternative=greater), so i think there is a non-random spatial process
> underlying the data. On the other hand, ape makes me accept the hull
> hypothesis of random spatial processes.
>
> Any help on this matter is highly appreciated!
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list