[R-sig-Geo] Moran Test for Spatial Autocorrelation - Significance credible or due to misspecification of the model?

Tue Jul 19 16:15:16 CEST 2011

On Tue, 19 Jul 2011, Julia Dechamps wrote:

> Dear Roger,
>
> Thank you very much for your quick reply! Further to what you were asking ad 3.:
>
>> all.linked
> [1] 780.2701

Yes, this is the reason for the problems. I suggest improving the 
coordinate measurements first, then constructing a graph-based neighbour 
object, as your observations are very unevenly spaced. The construction of 
the weights depends on the kind of spatial process you envision, which may 
be contagion or perhaps diffusion in some form.

Note also that your temporal aggregation may smooth the data. I'm not 
aware of appropriate tests for spatio-temporal processes, and that may be 
what you really need. You may be aware that Cressie & Wikle (2011) has 
appeared, and that space-time data is a very active area at the moment.

Roger

>
>> summary(mydata_nb)
> Neighbour list object:
> Number of regions: 91
> Number of nonzero links: 8168
> Percentage nonzero weights: 98.63543
> Average number of links: 89.75824
> Link number distribution:
>
> 87 88 89 90
> 4  3  4 80
> 4 least connected regions:
> 8 10 12 15 with 87 links
> 80 most connected regions:
> 1 3 4 5 6 9 11 13 14 16 17 18 19 20 22 23 24 25 27 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 52 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 with 90 links
>
> So yes, most points seem connected to eachother. I repeated the tests (approaches A-C) with a two-sided alternative as well as a one-sided less than alternative and got the following results:
>
> # approach A:
> # default: Moran's I test under randomisation:
> # "greater" p-value = 0.9967
> # "two.sided" p-value = 0.00668
> # "less" p-value = 0.00334
>
> # approach B:
> # binary: Moran's I test under randomisation:
> # "greater" p-value = 0.9968
> # "two-sided" p-value = 0.0064
> # "less" p-value = 0.0032
>
> # approach C:
> # inverse: Moran's I test under randomisation:
> # "greater" p-value < 2.2e-16
> # "two.sided" p-value < 2.2e-16
> # "less" p-value = 1
>
> Why might the fact that most of the areas are connected lead to significant negative autocorrelation tough?
> When aggregating the duplicated points (of which they are 7 in total) and repeating steps 2.-5. I end up with the following test results:
>
> # approach A:
> # default: Moran's I test under randomisation:
> # "greater" p-value = 0.5
> # "two.sided" p-value = 1
> # "less" p-value = 0.5
>
> # approach B:
> # binary: Moran's I test under randomisation:
> # "greater" p-value = NA
> # "two-sided" p-value = 1
> # "less" p-value = NA
>
> # approach C:
> # inverse: Moran's I test under randomisation:
> # "greater" p-value < 2.2e-16
> # "two.sided" p-value < 2.2e-16
> # "less" p-value = 1
>
> I think it is quite likely that the GPS coordinates I obtained are heavily approximated (the ares are quite large as they represent subdistricts in large states). I shall now try and obtain possibly more accurate coordinates for the 91 areas using the zoom on GoogleEarth...
>
> I appreciate any further comments or suggestions you might have.
>
> Regards,
> Julia
>
>
>
>
> ________________________________________
> From: Roger Bivand [Roger.Bivand at nhh.no]
> Sent: 19 July 2011 13:08
> To: Julia Dechamps
> Cc: r-sig-geo at r-project.org
> Subject: Re: [R-sig-Geo] Moran Test for Spatial Autocorrelation - Significance credible or due to misspecification of the model?
>
> On Tue, 19 Jul 2011, Julia Dechamps wrote:
>
>> Dear list,
>>
>> I am new to spatial statistics and am working on the following: I have a
>> dataset of annual grain yields over 9 years for each of 91 (irregularly
>> spaced) areas. To begin with I averaged the yields over the 9 years, so
>> that I have 91 data points, which I would now like to test for the
>> presence of spatial autocorrelation via a global Moran I test
>> (moran.test in package spdep).
>>
>> Here is how I proceeded (where in steps 2. onwards I am trying to follow
>> Roger Bivand et al: Applied Spatial Data Analysis with R, ch.9):
>>
>> 1. I obtained approximate geographical point coordinates (lat and long)
>> for the 91 areas using the online service
>> http://www.gpsvisualizer.com/geocoder/.
>>
>> 2. I created a spatial object:
>> myCRS <- CRS("+proj=longlat +ellps=WGS84")
>> mycoordmat <- matrix(c(mydata.agg$longitude,mydata.agg$latitude),ncol=2)
>> colnames(mycoordmat) <- c("long","lat")
>> areanames <- paste("SD",1:91,sep="")
>> rownames(mycoordmat) <- areanames
>> sum(duplicated(mycoordmat))             # 7
>> mydata_sp <- SpatialPoints(mycoordmat, proj4string=myCRS)
>> coords <- coordinates(mydata_sp)
>>
>> 3. I created a distance based nbd:
>> mydata_nb1 <- knn2nb(knearneigh(coords, k=1, longlat=TRUE))
>> all.linked <- max(unlist(nbdists(mydata_nb1, coords, longlat=TRUE)))
>> mydata_nb <- dnearneigh(coords, 0, all.linked)
>
> What is all.linked - is it a large distance (in km)? What does
> summary(mydata_nb) say? Are most points connected to each other?
>
>>
>> 4. I created spatial weights for the nb object:
>> # approach A: default style W:
>> mydata_lw_W <- nb2listw(mydata_nb, style="W")
>>
>> # approach A: binary style B:
>> mydata_lw_B <- nb2listw(mydata_nb, style="B")
>>
>> # approach C: weights proportional to inverse distance btw. points representing the areas
>> # (believing that the strength of the neighbour relationship, i.e. (??) autocorrelation attenuates with distance):
>> dsts <- nbdists(mydata_nb, coords, longlat=TRUE)
>> idw <- lapply(dsts, function(x) 1/x)
>> mydata_lw_idwW <- nb2listw(mydata_nb, glist=idw, style="W")
>> summary(unlist(mydata_lw_idwW$weights))
>>
>> 5. Here are the resulting p-values from the global Moran tests:
>> # approach A:
>> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_W)
>> # Moran's I test under randomisation: p-value = 0.9967
>>
>> # approach B:
>> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_B)
>> # Moran's I test under randomisation: p-value = 0.9968
>>
>
> The two above are actually significant negative autocorrelation, if you
> change alternative to other than "greater" - the default. I think that
> almost all observations are connected to each other, leading to this
> effect.
>
>> # approach C:
>> moran.test(mydata.agg$yr1to9avg, listw=irrig_lw_idwW)
>> # Moran's I test under randomisation: p-value < 2.2e-16
>>
>> Now my questions:
>>
>> Approach C seems to suggests strong evidence for rejecting the null of
>> no spatial autocorrelation, whereas the other two approaches show
>> absolutely no evidence for rejecting the null of no spatial
>> autocorrelation. Bivand et al. repeatedly warn that apparent spatial
>> autocorrelation may actually stem from misspecification of the model,
>> and I am wondering whether this is the case here. I do not have any
>> auxiliary variables such as spatially patterned ecological variables
>> that I could include in my model for the mean function. I can therefore
>> only really alter the choice of neighbour relationships and spatial
>> weights.
>
> No, see above.
>
>>
>> With regard to the neighbourhood relationships: I decided to choose
>> distance based neighbours such that all areas have at least one
>> neighbour (via all.linked). Contiguity-based or grid-based neighbours
>> are not appropriate for my data; with graph-based neighbours I run into
>> troubles because 7 of my coordinates are duplicated. Knn or higher order
>> neighbours ran into issues when I tried to implement them. In light of
>> these observations and my data, is my distance-based approach sensible?
>
> Beware of inverse distance when points are duplicated, the effects may be
> perverse. Either aggregate the duplicated points, or disambiguate them -
> move them apart. Your determination of position is likely to incur
> measurement error when the geocoder defaults to an approximation. Can you
> get more precise coordinates by zooming in on for example Google Earth,
> marking points, and dumping to KML? The KML can be read with readOGR() in
> rgdal. With disambiguated points, you can use graph-based neighbours.
>
> Hope this helps,
>
> Roger
>
>>
>> With regard to the assignment of spatial weights: I would have believed
>> that it is sensible to assume that the strength of potential spatial
>> autocorrelation in yields attenuates with distance between the areas
>> under consideration. I therefore favour approach C conceptually. However
>> Bivand et al. warn that if we know little about the assumed spatial
>> process we should avoid moving far from the binary representation of a
>> wight of unity for neighbours, which is what styles W and B (approaches
>> A and B) both implement as far as I understand? In light of these
>> observations and given the test results, which approach seems to be most
>> appropriate in my case?
>>
>> I would be extremely grateful for any advice you could give me.
>> Thank you!
>>
>> Julia
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Roger Bivand
> Department of Economics, NHH Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: Roger.Bivand at nhh.no
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no