[R-sig-Geo] Local Moran p-values in Geoda, Python, and R

Thu Mar 12 02:51:16 CET 2015

Here's the R LISA map:
http://willonrails.com/images/Lisa_map_R.png

And here's the Geoda version:
http://willonrails.com/images/Lisa_map_geoda.png

I'm using Geoda 1.6.6. Since we were getting inconsistent answers
between R and Geoda, the professor ran the analysis in Python to
get a third opinion. I don't have the Python version of the map
on me, but it's very similar to Geoda's.

I was able to export Geoda's local I and p-values. I put up a zip file
with all the data, R script, the maps, and the spatial weights files
for Geoda:
willonrails.com/images/lisa_spdep.zip

The local Moran's I in R and Geoda are very similar -- on average they
only differ by 0.000290044.

The p-values, on average, differ by 0.1808931.

In both I used 10 nearest neighbors for the weights. I did create the
weights separately in R and Geoda. The Geoda weights are in the zip
file. I'm new to coordinate projections, but here's the relevant part
of the R script:

------------------------------------------------
# nearest neighbors (nnb)
coords <- coordinates(nat)
IDs <- row.names(as(nat, "data.frame"))
nat_10nnb <- knn2nb(knearneigh(coords, k=10), row.names=IDs)
------------------------------------------------

'nat' is an object of class SpatialPolygonsDataFrame, so if I
understand correctly I don't need to set the 'longlat' argument in
knearneigh. I also plotted a random selection of the counties in R,
along with their neighboring counties from the weights matrix, and
everything appeared to be working correctly.

I also tried using Geoda's weights in R. Surprisingly, this map is
even sparser (higher p-values) than the first map I made in R. I'm
tempted to think I mixed up the data, but the map looks correct
when I plot all counties regardless of p-value.

----------------------------------------
> Date: Wed, 11 Mar 2015 19:28:11 +0100
> From: Roger.Bivand at nhh.no
> To: williamcmay at live.com
> CC: r-sig-geo at r-project.org
> Subject: Re: [R-sig-Geo] Local Moran p-values in Geoda, Python, and R
>
> On Wed, 11 Mar 2015, William May wrote:
>
>> I sent this to Roger Bivand earlier. Posting it here for
>> posterity, and in case other people have more to add:
>>
>> ---------------------------------------------------------------------------
>> Hi Roger,
>>
>> I'm taking a spatial econometrics class, and we've been making some
>> LISA maps. We noticed that the p-values from the spdep package seem to
>> be a lot different than the results from Geoda or from Python's PySAL
>> package.
>>
>> I looked through the function definitions and noticed that Python, and
>> I think Geoda, use simulations to estimate the p-value, while your
>> localmoran function uses an equation.
>>
>> I attached the graphs from Geoda and from R, and my R code and the
>> original dataset. [R code is added below.] Is this kind of
>> difference normal?
>
> If you would put the figures on a website, we could compare them. One
> possible reason for differences is the alternative= argument to
> localmoran(). It's default is "greater", but you could try "two.sided" to
> see whether this gives values closer to GeoDa. You don't say, by the way,
> which version of GeoDa you are using (since you mention Python, probably
> not legacy GeoDa), nor whether you can get the results out in tabular
> form. Are you using the same spatial weights - both can read and write GAL
> files. Are you aware that knearneigh() takes a longlat= argument, which
> must be used if the coordinates are not projected. The p-values in
> localmoran() are analytical ones from: Anselin, L. 1995. Local indicators
> of spatial association, Geographical Analysis, 27, 93--115. As you
> mention, in GeoDa they are by leave-one-out permutation bootstrapping.
>
> Roger
>
>>
>> Is there a reason to prefer spdep's results over the Geoda results (or
>> vice versa)?
>>
>> Thanks for any help,
>>
>> Will
>> ---------------------------------------------------------------------------
>>
>>
>> Here's his response:
>>
>> ---------------------------------------------------------------------------
>> Please *do* write to the R-sig-geo list rather than to me directly -
>> others can answer your question as well, perhaps better, and in a more
>> timely way than I can. In addition, threads in the list can be
>> searched in the archives, so others can avoid the same problem later.
>>
>> The only reliable test is localmoran.exact(), followed by
>> localmoran.sad(), for reasons given in the references in their help
>> pages.  If you test more than one observation, remember to adjust the
>> p-value for multiple comparisons. Sampling _i (all but i)
>> overrepresents non-neighbours of i for a test on i. The formulae used
>> are from the original paper and most often are similar to the
>> permutation bootstrap values.
>>
>> Crucially, almost all use of local Moran's I (and local Geary and
>> Getis) fall foul of the autocorrelation being related to a
>> mis-specified mean model (omitted explanatory variables and/or omitted
>> adjustment for global autocorrelation). Look at the references to
>> localmoran.sad and localmoran.exact for more details, and at the
>> relevant parts of Waller & Gotway (2004), Schabenberger & Gotway
>> (2005), and in Bivand et al. (2008, 2013, ASDAR-book, ch. 9).
>>
>> LISA maps are extremely misleading when the mean model is
>> mis-specified, and when the p-values are not adjusted for multiple
>> comparisons. They look simple, but because adjustment for multiple
>> comparisons is a subjective choice, you can almost choose the map you
>> want.
>>
>> Roger
>> ---------------------------------------------------------------------------
>>
>>
>> And here's my R code:
>>
>> ---------------------------------------------------------------------------
>> # make a LISA cluster map
>>
>> library(maptools)
>> library(spdep)
>>
>> setwd('/home/will/classes/polisci_610/1/')
>>
>> # get the data
>> nat <- readShapeSpatial("NAT.SHP", ID="FIPSNO")
>> nat.data<-data.frame(nat)
>> attach(nat.data)
>>
>> # nearest neighbors (nnb)
>> coords<-coordinates(nat)
>> IDs<-row.names(as(nat, "data.frame"))
>> nat_10nnb<-knn2nb(knearneigh(coords, k=10), row.names=IDs)
>>
>> # "W" identifies row standardized weights
>> nat_10nnb_w <- nb2listw(nat_10nnb, style="W")
>>
>> # can preset list_w object; if do this, remember to change if needed
>> list_w <- nat_10nnb_w
>>
>> # LISA values
>> lisa <- localmoran(HR60, list_w, zero.policy = T)
>>
>> # centers the variable of interest around its mean
>> cDV <- HR60 - mean(HR60)
>>
>> # centers the local Moran's around the mean
>> mI <- lisa[, 1]
>> C_mI <- mI - mean(mI) # but we don't want to center it! Only the sign
>> # matters.
>>
>> quadrant <- vector(mode="numeric",length=nrow(lisa))
>> quadrant[cDV>0 & mI>0] <- 1
>> quadrant[cDV <0 & mI>0] <- 2
>> quadrant[cDV>0 & mI<0] <- 3
>> quadrant[cDV <0 & mI<0] <- 4
>>
>> # set a statistical significance level for the local Moran's
>> signif <- 0.05
>>
>> # places non-significant Moran's in the category "5"
>> quadrant[lisa[, 5]> signif] <- 5
>>
>>
>> # map
>> png(file="Lisa_map_R.png", width = 680, res = 100)
>> colors <- c("red", "blue", "lightpink", "skyblue2", rgb(.95, .95, .95))
>> par(mar=c(0,0,1,0)) # sets margin parameters for plot space
>> plot(nat, border="grey", col=colors[quadrant],
>>       main = "LISA Cluster Map, 1960 Homicides")
>> legend("bottomright",legend=c("high-high","low-low","high-low","low-high"),
>>        fill=colors,bty="n",cex=0.7,y.intersp=1,x.intersp=1)
>> dev.off()
>> ---------------------------------------------------------------------------
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 91 00
> e-mail: Roger.Bivand at nhh.no