[R-sig-Geo] Moran Test for Spatial Autocorrelation - Significance credible or due to misspecification of the model?

Tue Jul 19 15:46:07 CEST 2011

Dear Roger,

Thank you very much for your quick reply! Further to what you were asking ad 3.:

> all.linked
[1] 780.2701

> summary(mydata_nb)
Neighbour list object:
Number of regions: 91 
Number of nonzero links: 8168 
Percentage nonzero weights: 98.63543 
Average number of links: 89.75824 
Link number distribution:

87 88 89 90 
 4  3  4 80 
4 least connected regions:
8 10 12 15 with 87 links
80 most connected regions:
1 3 4 5 6 9 11 13 14 16 17 18 19 20 22 23 24 25 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 52 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 with 90 links 

So yes, most points seem connected to eachother. I repeated the tests (approaches A-C) with a two-sided alternative as well as a one-sided less than alternative and got the following results:

# approach A:
# default: Moran's I test under randomisation:
# "greater" p-value = 0.9967
# "two.sided" p-value = 0.00668
# "less" p-value = 0.00334

# approach B:
# binary: Moran's I test under randomisation:
# "greater" p-value = 0.9968
# "two-sided" p-value = 0.0064
# "less" p-value = 0.0032

# approach C:
# inverse: Moran's I test under randomisation:
# "greater" p-value < 2.2e-16
# "two.sided" p-value < 2.2e-16
# "less" p-value = 1

Why might the fact that most of the areas are connected lead to significant negative autocorrelation tough?
When aggregating the duplicated points (of which they are 7 in total) and repeating steps 2.-5. I end up with the following test results:

# approach A:
# default: Moran's I test under randomisation:
# "greater" p-value = 0.5
# "two.sided" p-value = 1
# "less" p-value = 0.5

# approach B:
# binary: Moran's I test under randomisation:
# "greater" p-value = NA
# "two-sided" p-value = 1
# "less" p-value = NA

# approach C:
# inverse: Moran's I test under randomisation:
# "greater" p-value < 2.2e-16
# "two.sided" p-value < 2.2e-16
# "less" p-value = 1

I think it is quite likely that the GPS coordinates I obtained are heavily approximated (the ares are quite large as they represent subdistricts in large states). I shall now try and obtain possibly more accurate coordinates for the 91 areas using the zoom on GoogleEarth...

I appreciate any further comments or suggestions you might have.

Regards,
Julia

________________________________________
From: Roger Bivand [Roger.Bivand at nhh.no]
Sent: 19 July 2011 13:08
To: Julia Dechamps
Cc: r-sig-geo at r-project.org
Subject: Re: [R-sig-Geo] Moran Test for Spatial Autocorrelation - Significance credible or due to misspecification of the model?

On Tue, 19 Jul 2011, Julia Dechamps wrote:

> Dear list,
>
> I am new to spatial statistics and am working on the following: I have a
> dataset of annual grain yields over 9 years for each of 91 (irregularly
> spaced) areas. To begin with I averaged the yields over the 9 years, so
> that I have 91 data points, which I would now like to test for the
> presence of spatial autocorrelation via a global Moran I test
> (moran.test in package spdep).
>
> Here is how I proceeded (where in steps 2. onwards I am trying to follow
> Roger Bivand et al: Applied Spatial Data Analysis with R, ch.9):
>
> 1. I obtained approximate geographical point coordinates (lat and long)
> for the 91 areas using the online service
> http://www.gpsvisualizer.com/geocoder/.
>
> 2. I created a spatial object:
> myCRS <- CRS("+proj=longlat +ellps=WGS84")
> mycoordmat <- matrix(c(mydata.agg$longitude,mydata.agg$latitude),ncol=2)
> colnames(mycoordmat) <- c("long","lat")
> areanames <- paste("SD",1:91,sep="")
> rownames(mycoordmat) <- areanames
> sum(duplicated(mycoordmat))             # 7
> mydata_sp <- SpatialPoints(mycoordmat, proj4string=myCRS)
> coords <- coordinates(mydata_sp)
>
> 3. I created a distance based nbd:
> mydata_nb1 <- knn2nb(knearneigh(coords, k=1, longlat=TRUE))
> all.linked <- max(unlist(nbdists(mydata_nb1, coords, longlat=TRUE)))
> mydata_nb <- dnearneigh(coords, 0, all.linked)

What is all.linked - is it a large distance (in km)? What does
summary(mydata_nb) say? Are most points connected to each other?

>
> 4. I created spatial weights for the nb object:
> # approach A: default style W:
> mydata_lw_W <- nb2listw(mydata_nb, style="W")
>
> # approach A: binary style B:
> mydata_lw_B <- nb2listw(mydata_nb, style="B")
>
> # approach C: weights proportional to inverse distance btw. points representing the areas
> # (believing that the strength of the neighbour relationship, i.e. (??) autocorrelation attenuates with distance):
> dsts <- nbdists(mydata_nb, coords, longlat=TRUE)
> idw <- lapply(dsts, function(x) 1/x)
> mydata_lw_idwW <- nb2listw(mydata_nb, glist=idw, style="W")
> summary(unlist(mydata_lw_idwW$weights))
>
> 5. Here are the resulting p-values from the global Moran tests:
> # approach A:
> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_W)
> # Moran's I test under randomisation: p-value = 0.9967
>
> # approach B:
> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_B)
> # Moran's I test under randomisation: p-value = 0.9968
>

The two above are actually significant negative autocorrelation, if you
change alternative to other than "greater" - the default. I think that
almost all observations are connected to each other, leading to this
effect.

> # approach C:
> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_idwW)
> # Moran's I test under randomisation: p-value < 2.2e-16
>
> Now my questions:
>
> Approach C seems to suggests strong evidence for rejecting the null of
> no spatial autocorrelation, whereas the other two approaches show
> absolutely no evidence for rejecting the null of no spatial
> autocorrelation. Bivand et al. repeatedly warn that apparent spatial
> autocorrelation may actually stem from misspecification of the model,
> and I am wondering whether this is the case here. I do not have any
> auxiliary variables such as spatially patterned ecological variables
> that I could include in my model for the mean function. I can therefore
> only really alter the choice of neighbour relationships and spatial
> weights.

No, see above.

>
> With regard to the neighbourhood relationships: I decided to choose
> distance based neighbours such that all areas have at least one
> neighbour (via all.linked). Contiguity-based or grid-based neighbours
> are not appropriate for my data; with graph-based neighbours I run into
> troubles because 7 of my coordinates are duplicated. Knn or higher order
> neighbours ran into issues when I tried to implement them. In light of
> these observations and my data, is my distance-based approach sensible?

Beware of inverse distance when points are duplicated, the effects may be
perverse. Either aggregate the duplicated points, or disambiguate them -
move them apart. Your determination of position is likely to incur
measurement error when the geocoder defaults to an approximation. Can you
get more precise coordinates by zooming in on for example Google Earth,
marking points, and dumping to KML? The KML can be read with readOGR() in
rgdal. With disambiguated points, you can use graph-based neighbours.

Hope this helps,

Roger

>
> With regard to the assignment of spatial weights: I would have believed
> that it is sensible to assume that the strength of potential spatial
> autocorrelation in yields attenuates with distance between the areas
> under consideration. I therefore favour approach C conceptually. However
> Bivand et al. warn that if we know little about the assumed spatial
> process we should avoid moving far from the binary representation of a
> wight of unity for neighbours, which is what styles W and B (approaches
> A and B) both implement as far as I understand? In light of these
> observations and given the test results, which approach seems to be most
> appropriate in my case?
>
> I would be extremely grateful for any advice you could give me.
> Thank you!
>
> Julia
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no