[R-sig-Geo] Moran Test for Spatial Autocorrelation - Significance credible or due to misspecification of the model?

Tue Jul 19 12:59:53 CEST 2011

Dear list,

I am new to spatial statistics and am working on the following: I have a dataset of annual grain yields over 9 years for each of 91 (irregularly spaced) areas. To begin with I averaged the yields over the 9 years, so that I have 91 data points, which I would now like to test for the presence of spatial autocorrelation via a global Moran I test (moran.test in package spdep). 

Here is how I proceeded (where in steps 2. onwards I am trying to follow Roger Bivand et al: Applied Spatial Data Analysis with R, ch.9): 

1. I obtained approximate geographical point coordinates (lat and long) for the 91 areas using the online service http://www.gpsvisualizer.com/geocoder/.

2. I created a spatial object:
myCRS <- CRS("+proj=longlat +ellps=WGS84")
mycoordmat <- matrix(c(mydata.agg$longitude,mydata.agg$latitude),ncol=2)
colnames(mycoordmat) <- c("long","lat")
areanames <- paste("SD",1:91,sep="")
rownames(mycoordmat) <- areanames
sum(duplicated(mycoordmat))	          # 7
mydata_sp <- SpatialPoints(mycoordmat, proj4string=myCRS)
coords <- coordinates(mydata_sp)

3. I created a distance based nbd:
mydata_nb1 <- knn2nb(knearneigh(coords, k=1, longlat=TRUE))
all.linked <- max(unlist(nbdists(mydata_nb1, coords, longlat=TRUE)))
mydata_nb <- dnearneigh(coords, 0, all.linked)

4. I created spatial weights for the nb object:
# approach A: default style W:
mydata_lw_W <- nb2listw(mydata_nb, style="W")

# approach A: binary style B:
mydata_lw_B <- nb2listw(mydata_nb, style="B")

# approach C: weights proportional to inverse distance btw. points representing the areas
# (believing that the strength of the neighbour relationship, i.e. (??) autocorrelation attenuates with distance):
dsts <- nbdists(mydata_nb, coords, longlat=TRUE)
idw <- lapply(dsts, function(x) 1/x)
mydata_lw_idwW <- nb2listw(mydata_nb, glist=idw, style="W")
summary(unlist(mydata_lw_idwW$weights))

5. Here are the resulting p-values from the global Moran tests:
# approach A: 
moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_W)
# Moran's I test under randomisation: p-value = 0.9967

# approach B:
moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_B)
# Moran's I test under randomisation: p-value = 0.9968

# approach C:
moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_idwW)
# Moran's I test under randomisation: p-value < 2.2e-16

Now my questions:

Approach C seems to suggests strong evidence for rejecting the null of no spatial autocorrelation, whereas the other two approaches show absolutely no evidence for rejecting the null of no spatial autocorrelation. Bivand et al. repeatedly warn that apparent spatial autocorrelation may actually stem from misspecification of the model, and I am wondering whether this is the case here. I do not have any auxiliary variables such as spatially patterned ecological variables that I could include in my model for the mean function. I can therefore only really alter the choice of neighbour relationships and spatial weights.

With regard to the neighbourhood relationships: I decided to choose distance based neighbours such that all areas have at least one neighbour (via all.linked). Contiguity-based or grid-based neighbours are not appropriate for my data; with graph-based neighbours I run into troubles because 7 of my coordinates are duplicated. Knn or higher order neighbours ran into issues when I tried to implement them. In light of these observations and my data, is my distance-based approach sensible?

With regard to the assignment of spatial weights: I would have believed that it is sensible to assume that the strength of potential spatial autocorrelation in yields attenuates with distance between the areas under consideration. I therefore favour approach C conceptually. However Bivand et al. warn that if we know little about the assumed spatial process we should avoid moving far from the binary representation of a wight of unity for neighbours, which is what styles W and B (approaches A and B) both implement as far as I understand? In light of these observations and given the test results, which approach seems to be most appropriate in my case?

I would be extremely grateful for any advice you could give me.
Thank you!

Julia