[R-sig-Geo] Local Moran p-values in Geoda, Python, and R
William May
williamcmay at live.com
Wed Mar 11 19:08:00 CET 2015
I sent this to Roger Bivand earlier. Posting it here for
posterity, and in case other people have more to add:
---------------------------------------------------------------------------
Hi Roger,
I'm taking a spatial econometrics class, and we've been making some
LISA maps. We noticed that the p-values from the spdep package seem to
be a lot different than the results from Geoda or from Python's PySAL
package.
I looked through the function definitions and noticed that Python, and
I think Geoda, use simulations to estimate the p-value, while your
localmoran function uses an equation.
I attached the graphs from Geoda and from R, and my R code and the
original dataset. [R code is added below.] Is this kind of
difference normal?
Is there a reason to prefer spdep's results over the Geoda results (or
vice versa)?
Thanks for any help,
Will
---------------------------------------------------------------------------
Here's his response:
---------------------------------------------------------------------------
Please *do* write to the R-sig-geo list rather than to me directly -
others can answer your question as well, perhaps better, and in a more
timely way than I can. In addition, threads in the list can be
searched in the archives, so others can avoid the same problem later.
The only reliable test is localmoran.exact(), followed by
localmoran.sad(), for reasons given in the references in their help
pages. If you test more than one observation, remember to adjust the
p-value for multiple comparisons. Sampling _i (all but i)
overrepresents non-neighbours of i for a test on i. The formulae used
are from the original paper and most often are similar to the
permutation bootstrap values.
Crucially, almost all use of local Moran's I (and local Geary and
Getis) fall foul of the autocorrelation being related to a
mis-specified mean model (omitted explanatory variables and/or omitted
adjustment for global autocorrelation). Look at the references to
localmoran.sad and localmoran.exact for more details, and at the
relevant parts of Waller & Gotway (2004), Schabenberger & Gotway
(2005), and in Bivand et al. (2008, 2013, ASDAR-book, ch. 9).
LISA maps are extremely misleading when the mean model is
mis-specified, and when the p-values are not adjusted for multiple
comparisons. They look simple, but because adjustment for multiple
comparisons is a subjective choice, you can almost choose the map you
want.
Roger
---------------------------------------------------------------------------
And here's my R code:
---------------------------------------------------------------------------
# make a LISA cluster map
library(maptools)
library(spdep)
setwd('/home/will/classes/polisci_610/1/')
# get the data
nat <- readShapeSpatial("NAT.SHP", ID="FIPSNO")
nat.data<-data.frame(nat)
attach(nat.data)
# nearest neighbors (nnb)
coords<-coordinates(nat)
IDs<-row.names(as(nat, "data.frame"))
nat_10nnb<-knn2nb(knearneigh(coords, k=10), row.names=IDs)
# "W" identifies row standardized weights
nat_10nnb_w <- nb2listw(nat_10nnb, style="W")
# can preset list_w object; if do this, remember to change if needed
list_w <- nat_10nnb_w
# LISA values
lisa <- localmoran(HR60, list_w, zero.policy = T)
# centers the variable of interest around its mean
cDV <- HR60 - mean(HR60)
# centers the local Moran's around the mean
mI <- lisa[, 1]
C_mI <- mI - mean(mI) # but we don't want to center it! Only the sign
# matters.
quadrant <- vector(mode="numeric",length=nrow(lisa))
quadrant[cDV>0 & mI>0] <- 1
quadrant[cDV <0 & mI>0] <- 2
quadrant[cDV>0 & mI<0] <- 3
quadrant[cDV <0 & mI<0] <- 4
# set a statistical significance level for the local Moran's
signif <- 0.05
# places non-significant Moran's in the category "5"
quadrant[lisa[, 5]> signif] <- 5
# map
png(file="Lisa_map_R.png", width = 680, res = 100)
colors <- c("red", "blue", "lightpink", "skyblue2", rgb(.95, .95, .95))
par(mar=c(0,0,1,0)) # sets margin parameters for plot space
plot(nat, border="grey", col=colors[quadrant],
main = "LISA Cluster Map, 1960 Homicides")
legend("bottomright",legend=c("high-high","low-low","high-low","low-high"),
fill=colors,bty="n",cex=0.7,y.intersp=1,x.intersp=1)
dev.off()
---------------------------------------------------------------------------
More information about the R-sig-Geo
mailing list