[R-sig-Geo] spatial autocorrelation in GAM residuals for large data set
Roger Bivand
Roger@B|v@nd @end|ng |rom nhh@no
Tue Aug 20 22:43:08 CEST 2019
On Tue, 20 Aug 2019, Elizabeth Webb wrote:
> Hello,
>
> I have a large data set (~100k rows) containing observations at points
> (MODIS pixels) across the northern hemisphere. I have created a GAM
> using the bam command in mgcv and I would like to check the model
> residuals for spatial autocorrelation.
>
> One idea is to use the DHARMa package
> (https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#spatial-autocorrelation).
> The code looks something like this:
>
> simulationOutput <- simulateResiduals(fittedModel = mymodel) # point at which R runs into memory problems
> testSpatialAutocorrelation(simulationOutput = simulationOutput, x = data$latitude, y= data$longitude)
>
> However, this runs into memory problems.
>
> Another idea is to use the following code, after this tutorial
> (http://www.flutterbys.com.au/stats/tut/tut8.4a.html):
> library(ape)
> library(fields)
> coords = cbind(data$longitude, data$latitude)
> w = rdist(coords) # point at which R runs into memory problems
> Moran.I(x = residuals(mymodel), w = w)
>
> But this also runs into memory problems. I have tried increasing the
> amount of memory allotted to R, but that just means R works for longer
> before timing out.
I do hope that you read
https://cran.r-project.org/web/packages/ape/vignettes/MoranI.pdf
first, because the approach used in ape has been revised.
The main problem is that ape uses by default a square matrix, and it is
uncertain whether sparse matrices are accepted. This means that completely
unneeded computations are carried out - dense matrices should never be
used unless there is a convincing scientific argument (see
https://edzer.github.io/UseR2019/part2.html#exercise-review-1 for a
development on why distances are wasteful when edge counts on a graph do
the same thing sparsely).
Use one of the approaches described in the tutorial and you may be OK, but
you should not trust the outcome of Moran's I on residuals without using
an appropriate variant. Say you can represent your GAM with a linear model
with say spline terms, you can use Moran's I for regression residuals.
Take care that the average number of neighbours is very small (6-10), and
large numbers of observations should not be a problem.
A larger problem is that Moran's I (also for residuals) also responds to
other mis-specifications than spatial autocorrelation, in particular
missing variables and spatial processes with a different scale from the
units of observation chosen.
>
> So, two questions: (1) Is there a memory efficient way to check for
> spatial autocorrelation using Moran's I in large data sets? or (2) Is
> there another way to check for spatial autocorrelation (besides Moran's
> I) that won't have such memory problems?
1) Yes, see above, do not use dense matrices
2) Consider a higher level MRF term in your GAM for aggregates of your
observations if such aggregation makes sense for your data.
Hope this clarifies,
Roger
>
> Thanks in advance,
>
> Elizabeth
>
>
>
>
>
>
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
More information about the R-sig-Geo
mailing list