[R-sig-Geo] localmoran p-values with/without permutation

Roger Bivand Roger.Bivand at nhh.no
Sun Feb 15 16:10:32 CET 2009


On Sun, 15 Feb 2009, Valerio Bartolino wrote:

> Dear list,
> I've the objective to identify hotspot areas from a model prediction
> over a high resolution grid. After calculating a spatial weight object I
> easily applied the localmoran function from the spdep library. It's not
> really clear to me the meaning of the p-values associated to the
> localmoran function and how much I can rely on them in terms of
> statistical significance. For instance can I use these p-values instead
> using a randomization approach? I would be glad for any clarification.

Yes, you can use the p-values - they are based on the same analytical 
randomisation approach as that for global Moran's I - see the references. 
This approach adjusts for the possible divergence of the observed data 
from normality with respect to kurtosis, but the p-values are tainted by 
multiple comparisons.

By randomisation, also below, you seem to mean permutation bootstraping 
(or Monte Carlo, or Hope-type test). Note that if you permute over all the 
data, you are not actually doing what you think that you are doing, 
because only the (small) set of neighbour values should be used for 
permutation, not all observations. The approaches may be equivalent if you 
know definitely that your model of the data (mean model and covariance 
model) is fully specified: there are no missing variables, all the 
variables have the correct functional forms, and there are no omitted 
global spatial processes. This is a very strong assumption, especially 
given the typical model of y ~ 1 (just the mean) used in Moran and local 
Moran tests.

Instead, it may be safer to do parametric bootstrapping, drawing from the 
actual distribution of observations for the small neighbour set - this 
also permits other approaches to be examined. See Waller & Gotway (2004) 
p. 239 for a discussion. In fact, you can actually use localmoran.sad() 
for a Saddlepoint approximation, or localmoran.exact() for the exact test, 
which are typically similar to the analytical randomisation approach for 
much of the range of the statistic, but perform much better where 
discrimination is needed, and are pretty fast, so speed is not an issue.

This expands Danlin Yu's helpful comments, I share his concerns about 
using unadjusted p-values.

If you want to look at the hotspot literature more closely, see Chapter 7 
in Waller & Gotway, and perhaps review implementations of relevant methods 
in the DCluster package.

Hope this helps,

Roger

>
> Moreover, I want to calculate a statistical significance also through a
> randomization approach (commonly used with Moran's I statistic). The
> idea behind the randomization is rather simple, and also coding doesn't
> seem too difficult, but the identified hotspots appear larger and
> disaggregated respect those identified looking at the p-values provided
> by the localmoran function at a similar significant level.
>
> Did I do some mistake in the following code I wrote for the permutation?
> Thanks for any advice, explanation or comment you will have
>
> Valerio Bartolino
>
> ###########################################
> require(spdep)
>
> locMoranI.perm <- function(x, R, listw, ...){
>
> # x is a vector of the values on which to calculate the MoranI statistic
> # R, listw, ... are all the arguments passed to the localmoran function
>
> 	mat <- matrix(data=NA, nrow=R, ncol=length(x))
> 		for(i in 1:R){
> 		perm <- sample(x, replace=F)
> 		I.locmor <- localmoran(perm, listw, ...)
> 		mat[i,] <- I.locmor[,1]
> 		rm(I.locmor)
> 		rm(perm)
> 		}
>
> 	return(mat)
> }
>
> # I used this new function as follow:
> nsim <- 1000
> I.perm <- locMoranI.perm(z, R=nsim, listw=nbw)
>
> MorI <- localmoran(z, listw=nbw)
>
> # select for instance a 0.01 pseudo-significance level
> p.perm <- apply(I.perm, 2, quantile, probs=0.99)
>
> ## because I-Moran identify spatial clustering
> ## high and low hotspots have no distinct I values
> ## make a vector to distinguish significant and high hotspots
> hot <- ifelse(p.perm-MorI[,1]<0 & z>mean(z),1,0)
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list