[R-sig-Geo] eigenvectors increase spatial autocorrelation in ols regression

Roger Bivand Roger@B|v@nd @end|ng |rom nhh@no
Sat Jul 25 13:40:48 CEST 2020


On Thu, 23 Jul 2020, Vinicius Maia wrote:

> Dear Roger,
>
> Why ncf::correlog() should not be used for regression residuals in this
> case?
>
> Could you clarify this to me, please?

I do not have my copy of Cliff & Ord (1973) to hand. The value of Moran's 
I is the same, but the inferences will be affected by changes in the 
expectation and variance (under Normality):

library(spdep)
columbus <- st_read(system.file("shapes/columbus.shp",
   package="spData")[1])
col.gal.nb <- read.gal(system.file("weights/columbus.gal",
   package="spData")[1])
OLS <- lm(CRIME ~ INC + HOVAL, data=columbus)
lm.morantest(OLS, listw=nb2listw(col.gal.nb))
moran.test(residuals(OLS), listw=nb2listw(col.gal.nb),
   randomisation=FALSE)

This is because standard regression assumptions affect how we view the 
regression residuals, details in Cliff & Ord. See for example the code in 
moran.test() for E(I) (var(I) is similar):

     EI <- (-1)/wc$n1

where wc$n1 is n - 1 from spweights.constants(). In the lm.morantest() 
case:

     XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE])
     X <- model.matrix(terms(model), model.frame(model))
     if (length(nacoefs) > 0L)
         X <- X[, -nacoefs]
     if (!is.null(wts <- weights(model))) {
         X <- drop(t(sapply(1:length(wts), function(i) sqrt(wts[i]) *
             X[i, ])))
     }
     Z <- lag.listw(listw.U, X, zero.policy = zero.policy)
     C1 <- t(X) %*% Z
     trA <- (sum(diag(XtXinv %*% C1)))
     EI <- -((N * trA)/((N - p) * S0))

and S_0 is the sum of weights (N if row standardised), p is the number 
columns of X, and A = (X'X)^{-1} X'WX. For the distance band case, you 
also need to adjust N for the count of pairs of neighbours within that 
band; in moran.test() you see n <- as.double(length(which(cards > 0))) by 
default, where cards is a vector of neighbour counts; in lm.morantest() 
the equivalent by default is
  N <- as.double(length(which(card(listw$neighbours) > 0L)))

My presumption is that ncf::correlog() is for input variables, not 
regression residuals, as there is no argument to pass though the fitted 
model object from which to extract the model matrix or (X'X)^{-1}. 
Further, resampling regression residuals may be unsafe for similar 
reasons; the draws should not be from the residuals, I think.

Recall that the code is shown by typing the function name, and this best 
documents what is going on.

Again, refer to Cliff & Ord 1973 for the full development.

Roger


>
> Thank you
>
> Best,
>
> Vinicius
>
> Em qui., 23 de jul. de 2020 às 13:52, Roger Bivand <Roger.Bivand using nhh.no>
> escreveu:
>
>> On Wed, 22 Jul 2020, Peter B. Pearman wrote:
>>
>>>
>>> Dear Roger and list members,
>>>
>>> I have a ols regression and want to remove spatial autocorrelation (SAC)
>>> from the residuals, in order to avoid its potential effects of SAC on
>>> the hypothesis tests (and the reviewers/editor).  I have generated
>>> spatial eigenvectors with SpatialFiltering(), and added the generated
>>> vectors to the regression.  Surprisingly, SAC appears to become more
>>> pronounced.  I also tried ME(), but many more vectors are produced and
>>> SAC is also not removed. Isn't including the vectors from
>>> SpatialFiltering() supposed to reduce SAC?
>>>
>>> Can you please enlighten me as to what's going on, what I am doing
>>> wrong, or what I should try?
>>
>> You are using point support (your forests are at points not areas, so that
>> the observations are not contiguous). The advice to consider this as a
>> mixed model problem may be relevant. It is certainly the case that
>> ncf::correlog() should not be used for regression residuals.
>>
>> Further, your data are in a band 1.5-4.3E, 43.0-43.4N, so using 0.7
>> degrees in dnearneigh() was a risky choice, and squashes the data. If you
>> coerce to an sf object and apply an appropriate CRS, you get many fewer
>> neighbours on average.
>>
>> library(sf)
>> data <- st_as_sf(data, coords=c("LONG", "LAT"))
>> (nbnear4 <- dnearneigh(data, 0, 0.7))
>> st_crs(data) <- 4326
>> (nbnear4a <- dnearneigh(data, 0, 27))
>> # or project to a relevant planar UTM 31N spec.
>> data_utm31 <- st_transform(data, st_crs(32631))
>> (nbd <- dnearneigh(data_utm31, 0, 27))
>>
>> and so on. SpatialFiltering() and ME() test against global residual
>> autocorrelation in finding candidate eigenvectors, and running
>> lm.morantest() on the fitted model shows that there is no global
>> autocorrelation as intended (because positive and negative local
>> autocorrelation cancels out). I think that the underlying problems are
>> mixing approaches that are not asking the same questions, and in too
>> little care in choosing the weights.
>>
>> Hope this clarifies,
>>
>> Roger
>>
>>>
>>> Thanks in advance for you time.
>>>
>>> Peter
>>>
>>> The data are here:
>>>
>>>
>> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=
>>> sharing
>>>
>>> The analysis goes like this:
>>>
>>> library(spatialreg)
>>> library(spdep)
>>> library(tidyverse)
>>> library(car)
>>> library(ncf)
>>>
>>>
>>> data <- read_csv("for_RAR.csv")
>>> set.seed(12345)
>>> x <- data$ses.mntd
>>> y <- log(data$RAR)
>>> ols_for_RAR <- lm(y ~ x)
>>> ## qqplot() and shapiro.test() show residuals are nicely distributed
>>> # x is significant and R-square about 0.2, demonstrated here
>>> car::Anova(ols_for_RAR,type="III")
>>> summary(ols_for_RAR)
>>>
>>> # the following appears to make an acceptable neighbor network
>>> c1<-c(data$LONG)
>>> c2<-c(data$LAT)
>>> cbindForests<-cbind(c1,c2)
>>> # a value of 0.7, below, is sufficient to join all the points.
>>> # qualitatively the results aren't affected, as far as I see by setting
>> this
>>> higher
>>> # However, the number of eigenvectors generated by SpatialFiltering()
>> varies
>>> a lot
>>> nbnear4 <- dnearneigh(cbindForests, 0, 0.7)
>>> plot(nbnear4, cbindForests, col = "red", pch = 20)
>>>
>>> # SAC appears significant at short distances (<10km), which is what I
>> want
>>> to remove
>>> cor_for <- correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
>> resamp =
>>> 1000, latlon=TRUE, na.rm = TRUE)
>>> plot(cor_for$correlation[1:20],type="s")
>>> # p-values
>>> print(cor_for$p[which(cor_for$p < 0.05)])
>>> # Moran's I values
>>> cor_for$correlation[which(cor_for$p < 0.05)]
>>>
>>> # Generate optimized spatial eigenvectors using SpatialFiltering() and
>> use
>>> them
>>> # Several vectors are generated depending on values in dnearneigh()
>>> spfilt_mntd_RAR<- spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style =
>>> "W", tol=0.0001, ExactEV = TRUE)
>>> new_mod <- lm(y ~ x + fitted(spfilt_mntd_RAR))
>>> car::Anova(new_mod, type="III")
>>> summary(new_mod)
>>>
>>> # Plot Moran's I at distances under 20km
>>> cor_for_1c <- correlog(c1, c2, residuals(new_mod), increment = 1, resamp
>> =
>>> 1000, latlon=TRUE, na.rm = TRUE)
>>> plot(cor_for_1c$correlation[1:20],type="s")
>>>
>>> # Extract significant values of Moran's I
>>> cor_for_1c$p[which(cor_for_1c$p < 0.05)]
>>> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
>>>
>>> # The result is that Moran's I is significant at additional short
>> distances
>>> --
>>>
>>> _+_+_+_+_+_+_+_+_
>>>
>>> Peter B. Pearman
>>> Ikerbasque Research Professor
>>>
>>> Laboratory for Computational Ecology and Evolution
>>> Departamento de Biología Vegetal y Ecología
>>> Facultad de Ciencias y Tecnología
>>> Ap. 644
>>> Universidad del País Vasco/ Euskal Herriko Unibertsitatea
>>>
>>> Barrio Sarriena s/n
>>> 48940 Leioa, Bizkaia
>>>
>>> SPAIN
>>>
>>> Tel.  +34 94 601 8030
>>>
>>> Fax  +34 94 601 3500
>>>
>>> www.ehu.eus/es/web/bgppermp
>>>
>>>
>>>
>>> When you believe in things that you don’t understand
>>>
>>> Then you suffer
>>>
>>>                         -- Stevie Wonder
>>>
>>>
>>> Download my public encryption key here: https://pgp.mit.edu/
>>>
>>> and please use it when you send me e-mail.
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo using r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en


More information about the R-sig-Geo mailing list