[R-sig-Geo] question on raster Moran's I statistical significance

Thu Oct 21 10:40:18 CEST 2021

On Thu, 21 Oct 2021, Gabriel Cotlier wrote:

> Hello
>
> Thank you very much.
> I have large raster layers and would like to ask, in order to reduce the
> processing time of the simulation, choosing a smaller nsim value could help
> ? and if so, what could be the minimum nsim value recommended?

If you examine the code, you see that raster::Moran() always performs 
multiple raster::focal() calls, effectively constructing the spatial 
weights on each run. The problem is not nsim, it is the way that 
raster::Moran() is constructed.

Indeed, using a Hope-type test gives the same inferential outcome as using 
asymptotic methods for global tests for spatial autocorrelation, and is 
superfluous, but no other approach is feasible for raster::Moran() (which 
by the way only seems to use 4 neighbours although claiming it uses 8).

1. How large is large?

It is possible to generate nb neighbour objects for large data sets, 
making the use of spdep::moran.test() feasible, certainly for tens of 
thousands of observations (all the census tracts in coterminous US, for 
example).

This uses distances between raster cell centres to find neighbours:

> library(spdep)
Loading required package: sp
Loading required package: spData
Loading required package: sf
Linking to GEOS 3.10.0, GDAL 3.3.2, PROJ 8.1.1
> crds <- expand.grid(x=1:800, y=1:1200)
> dim(crds)
[1] 960000      2
> grd <- st_as_sf(crds, coords=c("x", "y"))
> grd$z <- runif(nrow(grd))
> system.time(dnbr <- dnearneigh(grd, 0, 1.01))
    user  system elapsed
  30.065   0.235  30.381
> dnbr
Neighbour list object:
Number of regions: 960000
Number of nonzero links: 3836000
Percentage nonzero weights: 0.0004162326
Average number of links: 3.995833
> system.time(dnbq <- dnearneigh(grd, 0, 1.42))
    user  system elapsed
  54.502   0.080  54.699
> dnbq
Neighbour list object:
Number of regions: 960000
Number of nonzero links: 7668004
Percentage nonzero weights: 0.0008320317
Average number of links: 7.987504

Once the neighbour objects are ready, conversion to spatial weights 
objects takes some time, and computing I with a constant mean model 
depends on the numbers of neighbours:

> system.time(lwr <- nb2listw(dnbr, style="B"))
    user  system elapsed
   6.694   0.000   6.722
> system.time(Ir <- moran.test(grd$z, lwr))
    user  system elapsed
  24.371   0.000  24.470
> Ir

 	Moran I test under randomisation

data:  grd$z
weights: lwr

Moran I statistic standard deviate = -0.06337, p-value = 0.5253
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
     -4.679899e-05     -1.041668e-06      5.213749e-07

> system.time(lwq <- nb2listw(dnbq, style="B"))
    user  system elapsed
   6.804   0.000   6.828
> system.time(Iq <- moran.test(grd$z, lwq))
    user  system elapsed
  46.703   0.012  46.843
> Iq

 	Moran I test under randomisation

data:  grd$z
weights: lwq

Moran I statistic standard deviate = -0.70373, p-value = 0.7592
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
     -3.604417e-04     -1.041668e-06      2.608222e-07

2. The larger your N, the less likely that the test means anything at all, 
because the assumption is that the observed entities are not simply 
arbitrary products of, say, resolution.

If you think of global Moran's I as a specification test of a regression 
of the variable of interest on the constant (the mean model is just the 
constant), for raster data the resolution controls the outcome 
(downscaling/upscaling will shift Moran's I). If you include covariates, 
patterning in the residuals of a richer model may well abate.

Hope this clarifies,

Roger

> Thanks a lot again.
> Kind regards,
>
>
> On Wed, Oct 20, 2021 at 8:45 PM Roger Bivand <Roger.Bivand using nhh.no> wrote:
>
>> On Wed, 20 Oct 2021, Gabriel Cotlier wrote:
>>
>>> Hello,
>>>
>>> I would like to estimate the Moran's *I* coefficient for raster data and
>>> together with the statical significance of the spatial autocorrelation
>>> obtained.
>>>
>>> I found that the raster package function Moran() although calculates the
>>> spatial autocorrelation index it apparently does not give directly the
>>> statical significance of the results obtained :
>>> https://search.r-project.org/CRAN/refmans/raster/html/autocor.html
>>>
>>> Could it be be possible to obtain the statistical significance of the
>>> results with either raster package or similar one?
>>
>>
>> fortunes::fortune("This is R")
>>
>>
>> library(raster)
>> r <- raster(nrows=10, ncols=10)
>> values(r) <- 1:ncell(r)
>> f <- matrix(c(0,1,0,1,0,1,0,1,0), nrow=3)
>> (rI <- Moran(r, f))
>> r1 <- r
>> nsim <- 499
>> res <- numeric(nsim)
>> set.seed(1)
>> for (i in 1:nsim) {
>>    values(r1) <- values(r)[sample(prod(dim(r)))]
>>    res[i] <- Moran(r1, f)
>> }
>>
>> Hope-type tests date back to Cliff and Ord; they are permutation
>> bootstraps.
>>
>> r_g <- as(r, "SpatialPixelsDataFrame")
>> library(spdep)
>> nb <- poly2nb(as(r_g, "SpatialPolygons"), queen=FALSE)
>> set.seed(1)
>> o <- moran.mc(r_g$layer, nb2listw(nb, style="B"), nsim=nsim,
>>    return_boot=TRUE)
>> x_a <- range(c(o$t, o$t0, res, rI))
>> plot(density(o$t), xlim=x_a)
>> abline(v=o$t0)
>> lines(density(res), lty=2)
>> abline(v=rI, lty=2)
>>
>> It is not immediately obvious from the code of raster::Moran() why it is
>> different, possibly because of padding the edges of the raster and thus
>> increasing the cell count.
>>
>> For added speed, the bootstrap can be parallelized in both cases; polygon
>> boundaries are perhaps not ideal.
>>
>> Hope this clarifies. Always provide a reproducible example, never post
>> HTML
>> mail.
>>
>> Roger Bivand
>>
>>
>>>
>>> Thanks a lot.
>>> Kind regards,
>>> Gabriel
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo using r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>
>> --
>> Roger Bivand
>> Emeritus Professor
>> Department of Economics, Norwegian School of Economics,
>> Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
>> e-mail: Roger.Bivand using nhh.no
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>
>
>

-- 
Roger Bivand
Emeritus Professor
Department of Economics, Norwegian School of Economics,
Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en