[R-sig-Geo] Local Moran p-values in Geoda, Python, and R

Roger Bivand Roger.Bivand at nhh.no
Wed Mar 11 19:28:11 CET 2015


On Wed, 11 Mar 2015, William May wrote:

> I sent this to Roger Bivand earlier. Posting it here for
> posterity, and in case other people have more to add:
>
> ---------------------------------------------------------------------------
> Hi Roger,
>
> I'm taking a spatial econometrics class, and we've been making some
> LISA maps. We noticed that the p-values from the spdep package seem to
> be a lot different than the results from Geoda or from Python's PySAL
> package.
>
> I looked through the function definitions and noticed that Python, and
> I think Geoda, use simulations to estimate the p-value, while your
> localmoran function uses an equation.
>
> I attached the graphs from Geoda and from R, and my R code and the
> original dataset. [R code is added below.] Is this kind of
> difference normal?

If you would put the figures on a website, we could compare them. One 
possible reason for differences is the alternative= argument to 
localmoran(). It's default is "greater", but you could try "two.sided" to 
see whether this gives values closer to GeoDa. You don't say, by the way, 
which version of GeoDa you are using (since you mention Python, probably 
not legacy GeoDa), nor whether you can get the results out in tabular 
form. Are you using the same spatial weights - both can read and write GAL 
files. Are you aware that knearneigh() takes a longlat= argument, which 
must be used if the coordinates are not projected. The p-values in 
localmoran() are analytical ones from: Anselin, L. 1995. Local indicators 
of spatial association, Geographical Analysis, 27, 93--115. As you 
mention, in GeoDa they are by leave-one-out permutation bootstrapping.

Roger

>
> Is there a reason to prefer spdep's results over the Geoda results (or
> vice versa)?
>
> Thanks for any help,
>
> Will
> ---------------------------------------------------------------------------
>
>
> Here's his response:
>
> ---------------------------------------------------------------------------
> Please *do* write to the R-sig-geo list rather than to me directly -
> others can answer your question as well, perhaps better, and in a more
> timely way than I can. In addition, threads in the list can be
> searched in the archives, so others can avoid the same problem later.
>
> The only reliable test is localmoran.exact(), followed by
> localmoran.sad(), for reasons given in the references in their help
> pages.  If you test more than one observation, remember to adjust the
> p-value for multiple comparisons. Sampling _i (all but i)
> overrepresents non-neighbours of i for a test on i. The formulae used
> are from the original paper and most often are similar to the
> permutation bootstrap values.
>
> Crucially, almost all use of local Moran's I (and local Geary and
> Getis) fall foul of the autocorrelation being related to a
> mis-specified mean model (omitted explanatory variables and/or omitted
> adjustment for global autocorrelation). Look at the references to
> localmoran.sad and localmoran.exact for more details, and at the
> relevant parts of Waller & Gotway (2004), Schabenberger & Gotway
> (2005), and in Bivand et al. (2008, 2013, ASDAR-book, ch. 9).
>
> LISA maps are extremely misleading when the mean model is
> mis-specified, and when the p-values are not adjusted for multiple
> comparisons. They look simple, but because adjustment for multiple
> comparisons is a subjective choice, you can almost choose the map you
> want.
>
> Roger
> ---------------------------------------------------------------------------
>
>
> And here's my R code:
>
> ---------------------------------------------------------------------------
> # make a LISA cluster map
>
> library(maptools)
> library(spdep)
>
> setwd('/home/will/classes/polisci_610/1/')
>
> # get the data
> nat <- readShapeSpatial("NAT.SHP", ID="FIPSNO")
> nat.data<-data.frame(nat)
> attach(nat.data)
>
> # nearest neighbors (nnb)
> coords<-coordinates(nat)
> IDs<-row.names(as(nat, "data.frame"))
> nat_10nnb<-knn2nb(knearneigh(coords, k=10), row.names=IDs)
>
> # "W" identifies row standardized weights
> nat_10nnb_w <- nb2listw(nat_10nnb, style="W")
>
> # can preset list_w object; if do this, remember to change if needed
> list_w <- nat_10nnb_w
>
> # LISA values
> lisa <- localmoran(HR60, list_w, zero.policy = T)
>
> # centers the variable of interest around its mean
> cDV <- HR60 - mean(HR60)
>
> # centers the local Moran's around the mean
> mI <- lisa[, 1]
> C_mI <- mI - mean(mI) # but we don't want to center it! Only the sign
> # matters.
>
> quadrant <- vector(mode="numeric",length=nrow(lisa))
> quadrant[cDV>0 & mI>0] <- 1
> quadrant[cDV <0 & mI>0] <- 2     
> quadrant[cDV>0 & mI<0] <- 3
> quadrant[cDV <0 & mI<0] <- 4
>
> # set a statistical significance level for the local Moran's
> signif <- 0.05
>
> # places non-significant Moran's in the category "5"
> quadrant[lisa[, 5]> signif] <- 5
>
>
> # map
> png(file="Lisa_map_R.png", width = 680, res = 100)
> colors <- c("red", "blue", "lightpink", "skyblue2", rgb(.95, .95, .95))
> par(mar=c(0,0,1,0)) # sets margin parameters for plot space
> plot(nat, border="grey", col=colors[quadrant],
>       main = "LISA Cluster Map, 1960 Homicides")
> legend("bottomright",legend=c("high-high","low-low","high-low","low-high"),
>        fill=colors,bty="n",cex=0.7,y.intersp=1,x.intersp=1)
> dev.off()
> ---------------------------------------------------------------------------
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list