[R-sig-Geo] question on raster Moran's I statistical significance

Fri Oct 22 20:25:00 CEST 2021

Hello,

With regards to my intention of solely estimating the pseudo p-value of
Moran's I from raster data layer, by converting it to polygon data in
order to be able to directly use the function  spdep::mc.moran() and
getting in one step not only value of Moran'sI but its statistical
significance without using other simulations but only that in the funcion
spdep::mc.moran() as an alternative I have used the code below. However,
since with the initial version of the code I got  several errors such as :

Error in moran. mc(My_raster, lw.l, 999) : r is not a numeric vector

So I converted "r" to a vector using as.vector() in :

M <- moran.mc(as.vector(r), lw.l, 999)

However then appeared another the error:

Error in na.fail.default(x) : missing values in object

Finally these errors disappeared with the following version of the code--
see below -- the version of the code below runs OK--unless for the input
raster I have used-- with no more errors or problems.

Therefore my question is : if instead of using arguments of the function
spdep::mc.moran() such as na.action = na.omit or other --which I have tried
and could not make work-- is it a valid way to solve the above errors  in
the way as presented in the code below?

Thanks a lot.
Kind regards,

################### Estimate of Moran's I and its p-value from a raster
layer ##################################
library(raster)
library(spdep)
library(easycsv)

rm(list = ls())

r<- raster(file.choose())
l <- rasterToPolygons(r)

nb.l <- poly2nb(l, queen=FALSE)
lw.l <- nb2listw(nb.l, style="W",zero.policy=TRUE)

v<-r[!is.na(r)]
M <- moran.mc((v), lw.l, 999, zero.policy=TRUE)
M

plot(M, main=sprintf("Original, Moran's I (p = %0.3f)", M$p.value),
xlab="Monte-Carlo Simulation of Moran I", sub =NA, cex.lab=1.5,
cex.axis=1.5, cex.main=1.5, cex.sub=1.5)
abline(v=M$statistic, col="red", lw=2)
dev.off()
################################################################################################

On Thu, Oct 21, 2021 at 1:13 PM Gabriel Cotlier <gabiklm01 using gmail.com> wrote:

> Hello
> Thanks a lot.
> Indeed your explanation makes it much more clear.
> Kind regards,
>
>
> On Thu, Oct 21, 2021 at 11:40 AM Roger Bivand <Roger.Bivand using nhh.no> wrote:
>
>> On Thu, 21 Oct 2021, Gabriel Cotlier wrote:
>>
>> > Hello
>> >
>> > Thank you very much.
>> > I have large raster layers and would like to ask, in order to reduce the
>> > processing time of the simulation, choosing a smaller nsim value could
>> help
>> > ? and if so, what could be the minimum nsim value recommended?
>>
>> If you examine the code, you see that raster::Moran() always performs
>> multiple raster::focal() calls, effectively constructing the spatial
>> weights on each run. The problem is not nsim, it is the way that
>> raster::Moran() is constructed.
>>
>> Indeed, using a Hope-type test gives the same inferential outcome as
>> using
>> asymptotic methods for global tests for spatial autocorrelation, and is
>> superfluous, but no other approach is feasible for raster::Moran() (which
>> by the way only seems to use 4 neighbours although claiming it uses 8).
>>
>> 1. How large is large?
>>
>> It is possible to generate nb neighbour objects for large data sets,
>> making the use of spdep::moran.test() feasible, certainly for tens of
>> thousands of observations (all the census tracts in coterminous US, for
>> example).
>>
>> This uses distances between raster cell centres to find neighbours:
>>
>> > library(spdep)
>> Loading required package: sp
>> Loading required package: spData
>> Loading required package: sf
>> Linking to GEOS 3.10.0, GDAL 3.3.2, PROJ 8.1.1
>> > crds <- expand.grid(x=1:800, y=1:1200)
>> > dim(crds)
>> [1] 960000      2
>> > grd <- st_as_sf(crds, coords=c("x", "y"))
>> > grd$z <- runif(nrow(grd))
>> > system.time(dnbr <- dnearneigh(grd, 0, 1.01))
>>     user  system elapsed
>>   30.065   0.235  30.381
>> > dnbr
>> Neighbour list object:
>> Number of regions: 960000
>> Number of nonzero links: 3836000
>> Percentage nonzero weights: 0.0004162326
>> Average number of links: 3.995833
>> > system.time(dnbq <- dnearneigh(grd, 0, 1.42))
>>     user  system elapsed
>>   54.502   0.080  54.699
>> > dnbq
>> Neighbour list object:
>> Number of regions: 960000
>> Number of nonzero links: 7668004
>> Percentage nonzero weights: 0.0008320317
>> Average number of links: 7.987504
>>
>> Once the neighbour objects are ready, conversion to spatial weights
>> objects takes some time, and computing I with a constant mean model
>> depends on the numbers of neighbours:
>>
>> > system.time(lwr <- nb2listw(dnbr, style="B"))
>>     user  system elapsed
>>    6.694   0.000   6.722
>> > system.time(Ir <- moran.test(grd$z, lwr))
>>     user  system elapsed
>>   24.371   0.000  24.470
>> > Ir
>>
>>         Moran I test under randomisation
>>
>> data:  grd$z
>> weights: lwr
>>
>> Moran I statistic standard deviate = -0.06337, p-value = 0.5253
>> alternative hypothesis: greater
>> sample estimates:
>> Moran I statistic       Expectation          Variance
>>      -4.679899e-05     -1.041668e-06      5.213749e-07
>>
>> > system.time(lwq <- nb2listw(dnbq, style="B"))
>>     user  system elapsed
>>    6.804   0.000   6.828
>> > system.time(Iq <- moran.test(grd$z, lwq))
>>     user  system elapsed
>>   46.703   0.012  46.843
>> > Iq
>>
>>         Moran I test under randomisation
>>
>> data:  grd$z
>> weights: lwq
>>
>> Moran I statistic standard deviate = -0.70373, p-value = 0.7592
>> alternative hypothesis: greater
>> sample estimates:
>> Moran I statistic       Expectation          Variance
>>      -3.604417e-04     -1.041668e-06      2.608222e-07
>>
>> 2. The larger your N, the less likely that the test means anything at
>> all,
>> because the assumption is that the observed entities are not simply
>> arbitrary products of, say, resolution.
>>
>> If you think of global Moran's I as a specification test of a regression
>> of the variable of interest on the constant (the mean model is just the
>> constant), for raster data the resolution controls the outcome
>> (downscaling/upscaling will shift Moran's I). If you include covariates,
>> patterning in the residuals of a richer model may well abate.
>>
>> Hope this clarifies,
>>
>> Roger
>>
>> > Thanks a lot again.
>> > Kind regards,
>> >
>> >
>> > On Wed, Oct 20, 2021 at 8:45 PM Roger Bivand <Roger.Bivand using nhh.no>
>> wrote:
>> >
>> >> On Wed, 20 Oct 2021, Gabriel Cotlier wrote:
>> >>
>> >>> Hello,
>> >>>
>> >>> I would like to estimate the Moran's *I* coefficient for raster data
>> and
>> >>> together with the statical significance of the spatial autocorrelation
>> >>> obtained.
>> >>>
>> >>> I found that the raster package function Moran() although calculates
>> the
>> >>> spatial autocorrelation index it apparently does not give directly the
>> >>> statical significance of the results obtained :
>> >>> https://search.r-project.org/CRAN/refmans/raster/html/autocor.html
>> >>>
>> >>> Could it be be possible to obtain the statistical significance of the
>> >>> results with either raster package or similar one?
>> >>
>> >>
>> >> fortunes::fortune("This is R")
>> >>
>> >>
>> >> library(raster)
>> >> r <- raster(nrows=10, ncols=10)
>> >> values(r) <- 1:ncell(r)
>> >> f <- matrix(c(0,1,0,1,0,1,0,1,0), nrow=3)
>> >> (rI <- Moran(r, f))
>> >> r1 <- r
>> >> nsim <- 499
>> >> res <- numeric(nsim)
>> >> set.seed(1)
>> >> for (i in 1:nsim) {
>> >>    values(r1) <- values(r)[sample(prod(dim(r)))]
>> >>    res[i] <- Moran(r1, f)
>> >> }
>> >>
>> >> Hope-type tests date back to Cliff and Ord; they are permutation
>> >> bootstraps.
>> >>
>> >> r_g <- as(r, "SpatialPixelsDataFrame")
>> >> library(spdep)
>> >> nb <- poly2nb(as(r_g, "SpatialPolygons"), queen=FALSE)
>> >> set.seed(1)
>> >> o <- moran.mc(r_g$layer, nb2listw(nb, style="B"), nsim=nsim,
>> >>    return_boot=TRUE)
>> >> x_a <- range(c(o$t, o$t0, res, rI))
>> >> plot(density(o$t), xlim=x_a)
>> >> abline(v=o$t0)
>> >> lines(density(res), lty=2)
>> >> abline(v=rI, lty=2)
>> >>
>> >> It is not immediately obvious from the code of raster::Moran() why it
>> is
>> >> different, possibly because of padding the edges of the raster and thus
>> >> increasing the cell count.
>> >>
>> >> For added speed, the bootstrap can be parallelized in both cases;
>> polygon
>> >> boundaries are perhaps not ideal.
>> >>
>> >> Hope this clarifies. Always provide a reproducible example, never post
>> >> HTML
>> >> mail.
>> >>
>> >> Roger Bivand
>> >>
>> >>
>> >>>
>> >>> Thanks a lot.
>> >>> Kind regards,
>> >>> Gabriel
>> >>>
>> >>>       [[alternative HTML version deleted]]
>> >>>
>> >>> _______________________________________________
>> >>> R-sig-Geo mailing list
>> >>> R-sig-Geo using r-project.org
>> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>> >>>
>> >>
>> >> --
>> >> Roger Bivand
>> >> Emeritus Professor
>> >> Department of Economics, Norwegian School of Economics,
>> >> Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
>> >> e-mail: Roger.Bivand using nhh.no
>> >> https://orcid.org/0000-0003-2392-6140
>> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>> >>
>> >
>> >
>> >
>>
>> --
>> Roger Bivand
>> Emeritus Professor
>> Department of Economics, Norwegian School of Economics,
>> Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway.
>> e-mail: Roger.Bivand using nhh.no
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>
>
> --
> Gabriel Cotlier, PhD
> Haifa Research Center for Theoretical Physics and Astrophysics (HCTPA)
> University of Haifa
> Israel
>
>
>

-- 
Gabriel Cotlier, PhD
Haifa Research Center for Theoretical Physics and Astrophysics (HCTPA)
University of Haifa
Israel

	[[alternative HTML version deleted]]