[R-sig-Geo] question on raster Moran's I statistical significance

Thu Oct 21 12:13:36 CEST 2021

Hello
Thanks a lot.
Indeed your explanation makes it much more clear.
Kind regards,

On Thu, Oct 21, 2021 at 11:40 AM Roger Bivand <Roger.Bivand using nhh.no> wrote:

> On Thu, 21 Oct 2021, Gabriel Cotlier wrote:
>
> > Hello
> >
> > Thank you very much.
> > I have large raster layers and would like to ask, in order to reduce the
> > processing time of the simulation, choosing a smaller nsim value could
> help
> > ? and if so, what could be the minimum nsim value recommended?
>
> If you examine the code, you see that raster::Moran() always performs
> multiple raster::focal() calls, effectively constructing the spatial
> weights on each run. The problem is not nsim, it is the way that
> raster::Moran() is constructed.
>
> Indeed, using a Hope-type test gives the same inferential outcome as using
> asymptotic methods for global tests for spatial autocorrelation, and is
> superfluous, but no other approach is feasible for raster::Moran() (which
> by the way only seems to use 4 neighbours although claiming it uses 8).
>
> 1. How large is large?
>
> It is possible to generate nb neighbour objects for large data sets,
> making the use of spdep::moran.test() feasible, certainly for tens of
> thousands of observations (all the census tracts in coterminous US, for
> example).
>
> This uses distances between raster cell centres to find neighbours:
>
> > library(spdep)
> Loading required package: sp
> Loading required package: spData
> Loading required package: sf
> Linking to GEOS 3.10.0, GDAL 3.3.2, PROJ 8.1.1
> > crds <- expand.grid(x=1:800, y=1:1200)
> > dim(crds)
> [1] 960000      2
> > grd <- st_as_sf(crds, coords=c("x", "y"))
> > grd$z <- runif(nrow(grd))
> > system.time(dnbr <- dnearneigh(grd, 0, 1.01))
>     user  system elapsed
>   30.065   0.235  30.381
> > dnbr
> Neighbour list object:
> Number of regions: 960000
> Number of nonzero links: 3836000
> Percentage nonzero weights: 0.0004162326
> Average number of links: 3.995833
> > system.time(dnbq <- dnearneigh(grd, 0, 1.42))
>     user  system elapsed
>   54.502   0.080  54.699
> > dnbq
> Neighbour list object:
> Number of regions: 960000
> Number of nonzero links: 7668004
> Percentage nonzero weights: 0.0008320317
> Average number of links: 7.987504
>
> Once the neighbour objects are ready, conversion to spatial weights
> objects takes some time, and computing I with a constant mean model
> depends on the numbers of neighbours:
>
> > system.time(lwr <- nb2listw(dnbr, style="B"))
>     user  system elapsed
>    6.694   0.000   6.722
> > system.time(Ir <- moran.test(grd$z, lwr))
>     user  system elapsed
>   24.371   0.000  24.470
> > Ir
>
>         Moran I test under randomisation
>
> data:  grd$z
> weights: lwr
>
> Moran I statistic standard deviate = -0.06337, p-value = 0.5253
> alternative hypothesis: greater
> sample estimates:
> Moran I statistic       Expectation          Variance
>      -4.679899e-05     -1.041668e-06      5.213749e-07
>
> > system.time(lwq <- nb2listw(dnbq, style="B"))
>     user  system elapsed
>    6.804   0.000   6.828
> > system.time(Iq <- moran.test(grd$z, lwq))
>     user  system elapsed
>   46.703   0.012  46.843
> > Iq
>
>         Moran I test under randomisation
>
> data:  grd$z
> weights: lwq
>
> Moran I statistic standard deviate = -0.70373, p-value = 0.7592
> alternative hypothesis: greater
> sample estimates:
> Moran I statistic       Expectation          Variance
>      -3.604417e-04     -1.041668e-06      2.608222e-07
>
> 2. The larger your N, the less likely that the test means anything at all,
> because the assumption is that the observed entities are not simply
> arbitrary products of, say, resolution.
>
> If you think of global Moran's I as a specification test of a regression
> of the variable of interest on the constant (the mean model is just the
> constant), for raster data the resolution controls the outcome
> (downscaling/upscaling will shift Moran's I). If you include covariates,
> patterning in the residuals of a richer model may well abate.
>
> Hope this clarifies,
>
> Roger
>
> > Thanks a lot again.
> > Kind regards,
> >
> >
> > On Wed, Oct 20, 2021 at 8:45 PM Roger Bivand <Roger.Bivand using nhh.no>
> wrote:
> >
> >> On Wed, 20 Oct 2021, Gabriel Cotlier wrote:
> >>
> >>> Hello,
> >>>
> >>> I would like to estimate the Moran's *I* coefficient for raster data
> and
> >>> together with the statical significance of the spatial autocorrelation
> >>> obtained.
> >>>
> >>> I found that the raster package function Moran() although calculates
> the
> >>> spatial autocorrelation index it apparently does not give directly the
> >>> statical significance of the results obtained :
> >>> https://search.r-project.org/CRAN/refmans/raster/html/autocor.html
> >>>
> >>> Could it be be possible to obtain the statistical significance of the
> >>> results with either raster package or similar one?
> >>
> >>
> >> fortunes::fortune("This is R")
> >>
> >>
> >> library(raster)
> >> r <- raster(nrows=10, ncols=10)
> >> values(r) <- 1:ncell(r)
> >> f <- matrix(c(0,1,0,1,0,1,0,1,0), nrow=3)
> >> (rI <- Moran(r, f))
> >> r1 <- r
> >> nsim <- 499
> >> res <- numeric(nsim)
> >> set.seed(1)
> >> for (i in 1:nsim) {
> >>    values(r1) <- values(r)[sample(prod(dim(r)))]
> >>    res[i] <- Moran(r1, f)
> >> }
> >>
> >> Hope-type tests date back to Cliff and Ord; they are permutation
> >> bootstraps.
> >>
> >> r_g <- as(r, "SpatialPixelsDataFrame")
> >> library(spdep)
> >> nb <- poly2nb(as(r_g, "SpatialPolygons"), queen=FALSE)
> >> set.seed(1)
> >> o <- moran.mc(r_g$layer, nb2listw(nb, style="B"), nsim=nsim,
> >>    return_boot=TRUE)
> >> x_a <- range(c(o$t, o$t0, res, rI))
> >> plot(density(o$t), xlim=x_a)
> >> abline(v=o$t0)
> >> lines(density(res), lty=2)
> >> abline(v=rI, lty=2)
> >>
> >> It is not immediately obvious from the code of raster::Moran() why it is
> >> different, possibly because of padding the edges of the raster and thus
> >> increasing the cell count.
> >>
> >> For added speed, the bootstrap can be parallelized in both cases;
> polygon
> >> boundaries are perhaps not ideal.
> >>
> >> Hope this clarifies. Always provide a reproducible example, never post
> >> HTML
> >> mail.
> >>
> >> Roger Bivand
> >>
> >>
> >>>
> >>> Thanks a lot.
> >>> Kind regards,
> >>> Gabriel
> >>>
> >>>       [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> R-sig-Geo using r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>
> >>
> >> --
> >> Roger Bivand
> >> Emeritus Professor
> >> Department of Economics, Norwegian School of Economics,
> >> Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
> >> e-mail: Roger.Bivand using nhh.no
> >> https://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>
> >
> >
> >
>
> --
> Roger Bivand
> Emeritus Professor
> Department of Economics, Norwegian School of Economics,
> Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
> e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

-- 
Gabriel Cotlier, PhD
Haifa Research Center for Theoretical Physics and Astrophysics (HCTPA)
University of Haifa
Israel

	[[alternative HTML version deleted]]