[R-sig-Geo] spatial autocorrelation in GAM residuals for large data set

Tue Aug 20 15:46:01 CEST 2019

Hello,

I have a large data set (~100k rows) containing observations at points (MODIS pixels) across the northern hemisphere.  I have created a GAM using the bam command in mgcv and I would like to check the model residuals for spatial autocorrelation.  

One idea is to use the DHARMa package (https://cran.r-project.org/web/packages/DHARMa/vignettes/DHARMa.html#spatial-autocorrelation).  The code looks something like this:

    simulationOutput  <-   simulateResiduals(fittedModel = mymodel) # point at which R runs into memory problems
    testSpatialAutocorrelation(simulationOutput = simulationOutput, x =  data$latitude, y= data$longitude)

However, this runs into memory problems.  

Another idea is to use the following code, after this tutorial (http://www.flutterbys.com.au/stats/tut/tut8.4a.html):
    library(ape)
    library(fields)
    coords = cbind(data$longitude, data$latitude)     
    w = rdist(coords)  # point at which R runs into memory problems
    Moran.I(x = residuals(mymodel), w = w)

But this also runs into memory problems.  I have tried increasing the amount of memory allotted to R, but that just means R works for longer before timing out.  

So, two questions: (1) Is there a memory efficient way to check for spatial autocorrelation using Moran's I in large data sets? or (2) Is there another way to check for spatial autocorrelation (besides Moran's I) that won't have such memory problems?

Thanks in advance,

Elizabeth