[R-sig-Geo] Moran Test for Spatial Autocorrelation - Significance credible or due to misspecification of the model?

Tue Jul 19 14:08:37 CEST 2011

On Tue, 19 Jul 2011, Julia Dechamps wrote:

> Dear list,
>
> I am new to spatial statistics and am working on the following: I have a 
> dataset of annual grain yields over 9 years for each of 91 (irregularly 
> spaced) areas. To begin with I averaged the yields over the 9 years, so 
> that I have 91 data points, which I would now like to test for the 
> presence of spatial autocorrelation via a global Moran I test 
> (moran.test in package spdep).
>
> Here is how I proceeded (where in steps 2. onwards I am trying to follow 
> Roger Bivand et al: Applied Spatial Data Analysis with R, ch.9):
>
> 1. I obtained approximate geographical point coordinates (lat and long) 
> for the 91 areas using the online service 
> http://www.gpsvisualizer.com/geocoder/.
>
> 2. I created a spatial object:
> myCRS <- CRS("+proj=longlat +ellps=WGS84")
> mycoordmat <- matrix(c(mydata.agg$longitude,mydata.agg$latitude),ncol=2)
> colnames(mycoordmat) <- c("long","lat")
> areanames <- paste("SD",1:91,sep="")
> rownames(mycoordmat) <- areanames
> sum(duplicated(mycoordmat))	          # 7
> mydata_sp <- SpatialPoints(mycoordmat, proj4string=myCRS)
> coords <- coordinates(mydata_sp)
>
> 3. I created a distance based nbd:
> mydata_nb1 <- knn2nb(knearneigh(coords, k=1, longlat=TRUE))
> all.linked <- max(unlist(nbdists(mydata_nb1, coords, longlat=TRUE)))
> mydata_nb <- dnearneigh(coords, 0, all.linked)

What is all.linked - is it a large distance (in km)? What does 
summary(mydata_nb) say? Are most points connected to each other?

>
> 4. I created spatial weights for the nb object:
> # approach A: default style W:
> mydata_lw_W <- nb2listw(mydata_nb, style="W")
>
> # approach A: binary style B:
> mydata_lw_B <- nb2listw(mydata_nb, style="B")
>
> # approach C: weights proportional to inverse distance btw. points representing the areas
> # (believing that the strength of the neighbour relationship, i.e. (??) autocorrelation attenuates with distance):
> dsts <- nbdists(mydata_nb, coords, longlat=TRUE)
> idw <- lapply(dsts, function(x) 1/x)
> mydata_lw_idwW <- nb2listw(mydata_nb, glist=idw, style="W")
> summary(unlist(mydata_lw_idwW$weights))
>
> 5. Here are the resulting p-values from the global Moran tests:
> # approach A:
> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_W)
> # Moran's I test under randomisation: p-value = 0.9967
>
> # approach B:
> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_B)
> # Moran's I test under randomisation: p-value = 0.9968
>

The two above are actually significant negative autocorrelation, if you 
change alternative to other than "greater" - the default. I think that 
almost all observations are connected to each other, leading to this 
effect.

> # approach C:
> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_idwW)
> # Moran's I test under randomisation: p-value < 2.2e-16
>
> Now my questions:
>
> Approach C seems to suggests strong evidence for rejecting the null of 
> no spatial autocorrelation, whereas the other two approaches show 
> absolutely no evidence for rejecting the null of no spatial 
> autocorrelation. Bivand et al. repeatedly warn that apparent spatial 
> autocorrelation may actually stem from misspecification of the model, 
> and I am wondering whether this is the case here. I do not have any 
> auxiliary variables such as spatially patterned ecological variables 
> that I could include in my model for the mean function. I can therefore 
> only really alter the choice of neighbour relationships and spatial 
> weights.

No, see above.

>
> With regard to the neighbourhood relationships: I decided to choose 
> distance based neighbours such that all areas have at least one 
> neighbour (via all.linked). Contiguity-based or grid-based neighbours 
> are not appropriate for my data; with graph-based neighbours I run into 
> troubles because 7 of my coordinates are duplicated. Knn or higher order 
> neighbours ran into issues when I tried to implement them. In light of 
> these observations and my data, is my distance-based approach sensible?

Beware of inverse distance when points are duplicated, the effects may be 
perverse. Either aggregate the duplicated points, or disambiguate them - 
move them apart. Your determination of position is likely to incur 
measurement error when the geocoder defaults to an approximation. Can you 
get more precise coordinates by zooming in on for example Google Earth, 
marking points, and dumping to KML? The KML can be read with readOGR() in 
rgdal. With disambiguated points, you can use graph-based neighbours.

Hope this helps,

Roger

>
> With regard to the assignment of spatial weights: I would have believed 
> that it is sensible to assume that the strength of potential spatial 
> autocorrelation in yields attenuates with distance between the areas 
> under consideration. I therefore favour approach C conceptually. However 
> Bivand et al. warn that if we know little about the assumed spatial 
> process we should avoid moving far from the binary representation of a 
> wight of unity for neighbours, which is what styles W and B (approaches 
> A and B) both implement as far as I understand? In light of these 
> observations and given the test results, which approach seems to be most 
> appropriate in my case?
>
> I would be extremely grateful for any advice you could give me.
> Thank you!
>
> Julia
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no