[R-sig-Geo] eigenvectors increase spatial autocorrelation in ols regression

Thu Jul 23 18:52:27 CEST 2020

On Wed, 22 Jul 2020, Peter B. Pearman wrote:

> 
> Dear Roger and list members,
> 
> I have a ols regression and want to remove spatial autocorrelation (SAC) 
> from the residuals, in order to avoid its potential effects of SAC on 
> the hypothesis tests (and the reviewers/editor).  I have generated 
> spatial eigenvectors with SpatialFiltering(), and added the generated 
> vectors to the regression.  Surprisingly, SAC appears to become more 
> pronounced.  I also tried ME(), but many more vectors are produced and 
> SAC is also not removed. Isn't including the vectors from 
> SpatialFiltering() supposed to reduce SAC?
> 
> Can you please enlighten me as to what's going on, what I am doing 
> wrong, or what I should try?

You are using point support (your forests are at points not areas, so that 
the observations are not contiguous). The advice to consider this as a 
mixed model problem may be relevant. It is certainly the case that 
ncf::correlog() should not be used for regression residuals.

Further, your data are in a band 1.5-4.3E, 43.0-43.4N, so using 0.7 
degrees in dnearneigh() was a risky choice, and squashes the data. If you 
coerce to an sf object and apply an appropriate CRS, you get many fewer 
neighbours on average.

library(sf)
data <- st_as_sf(data, coords=c("LONG", "LAT"))
(nbnear4 <- dnearneigh(data, 0, 0.7))
st_crs(data) <- 4326
(nbnear4a <- dnearneigh(data, 0, 27))
# or project to a relevant planar UTM 31N spec.
data_utm31 <- st_transform(data, st_crs(32631))
(nbd <- dnearneigh(data_utm31, 0, 27))

and so on. SpatialFiltering() and ME() test against global residual 
autocorrelation in finding candidate eigenvectors, and running 
lm.morantest() on the fitted model shows that there is no global 
autocorrelation as intended (because positive and negative local 
autocorrelation cancels out). I think that the underlying problems are 
mixing approaches that are not asking the same questions, and in too 
little care in choosing the weights.

Hope this clarifies,

Roger

> 
> Thanks in advance for you time.
> 
> Peter
> 
> The data are here:
> 
> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=
> sharing
> 
> The analysis goes like this:
> 
> library(spatialreg)
> library(spdep)
> library(tidyverse)
> library(car)
> library(ncf)
> 
> 
> data <- read_csv("for_RAR.csv")
> set.seed(12345)
> x <- data$ses.mntd
> y <- log(data$RAR)
> ols_for_RAR <- lm(y ~ x)
> ## qqplot() and shapiro.test() show residuals are nicely distributed
> # x is significant and R-square about 0.2, demonstrated here
> car::Anova(ols_for_RAR,type="III")
> summary(ols_for_RAR)
> 
> # the following appears to make an acceptable neighbor network
> c1<-c(data$LONG)
> c2<-c(data$LAT)
> cbindForests<-cbind(c1,c2)
> # a value of 0.7, below, is sufficient to join all the points.
> # qualitatively the results aren't affected, as far as I see by setting this
> higher
> # However, the number of eigenvectors generated by SpatialFiltering() varies
> a lot
> nbnear4 <- dnearneigh(cbindForests, 0, 0.7)
> plot(nbnear4, cbindForests, col = "red", pch = 20)
> 
> # SAC appears significant at short distances (<10km), which is what I want
> to remove
> cor_for <- correlog(c1, c2, residuals(ols_for_RAR), increment = 1, resamp =
> 1000, latlon=TRUE, na.rm = TRUE)
> plot(cor_for$correlation[1:20],type="s")
> # p-values
> print(cor_for$p[which(cor_for$p < 0.05)])
> # Moran's I values
> cor_for$correlation[which(cor_for$p < 0.05)]
> 
> # Generate optimized spatial eigenvectors using SpatialFiltering() and use
> them
> # Several vectors are generated depending on values in dnearneigh()
> spfilt_mntd_RAR<- spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style =
> "W", tol=0.0001, ExactEV = TRUE)
> new_mod <- lm(y ~ x + fitted(spfilt_mntd_RAR))
> car::Anova(new_mod, type="III")
> summary(new_mod)
> 
> # Plot Moran's I at distances under 20km
> cor_for_1c <- correlog(c1, c2, residuals(new_mod), increment = 1, resamp =
> 1000, latlon=TRUE, na.rm = TRUE)
> plot(cor_for_1c$correlation[1:20],type="s")
> 
> # Extract significant values of Moran's I
> cor_for_1c$p[which(cor_for_1c$p < 0.05)]
> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
> 
> # The result is that Moran's I is significant at additional short distances
> --
> 
> _+_+_+_+_+_+_+_+_
> 
> Peter B. Pearman
> Ikerbasque Research Professor
> 
> Laboratory for Computational Ecology and Evolution
> Departamento de Biología Vegetal y Ecología
> Facultad de Ciencias y Tecnología
> Ap. 644
> Universidad del País Vasco/ Euskal Herriko Unibertsitatea
> 
> Barrio Sarriena s/n
> 48940 Leioa, Bizkaia
> 
> SPAIN
> 
> Tel.  +34 94 601 8030
> 
> Fax  +34 94 601 3500
> 
> www.ehu.eus/es/web/bgppermp
> 
>  
> 
> When you believe in things that you don’t understand
> 
> Then you suffer
> 
>                         -- Stevie Wonder
> 
> 
> Download my public encryption key here: https://pgp.mit.edu/
> 
> and please use it when you send me e-mail.
> 
>  
> 
> 
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en