[R-sig-Geo] eigenvectors increase spatial autocorrelation in ols regression

Vinicius Maia v|n|c|u@@@@m@|@77 @end|ng |rom gm@||@com
Mon Jul 27 15:03:36 CEST 2020


Hi Roger,

Thank you for this clarification, it will certainly be useful for me and
others.

Best,

Vinicius

Em sáb., 25 de jul. de 2020 às 08:41, Roger Bivand <Roger.Bivand using nhh.no>
escreveu:

> On Thu, 23 Jul 2020, Vinicius Maia wrote:
>
> > Dear Roger,
> >
> > Why ncf::correlog() should not be used for regression residuals in this
> > case?
> >
> > Could you clarify this to me, please?
>
> I do not have my copy of Cliff & Ord (1973) to hand. The value of Moran's
> I is the same, but the inferences will be affected by changes in the
> expectation and variance (under Normality):
>
> library(spdep)
> columbus <- st_read(system.file("shapes/columbus.shp",
>    package="spData")[1])
> col.gal.nb <- read.gal(system.file("weights/columbus.gal",
>    package="spData")[1])
> OLS <- lm(CRIME ~ INC + HOVAL, data=columbus)
> lm.morantest(OLS, listw=nb2listw(col.gal.nb))
> moran.test(residuals(OLS), listw=nb2listw(col.gal.nb),
>    randomisation=FALSE)
>
> This is because standard regression assumptions affect how we view the
> regression residuals, details in Cliff & Ord. See for example the code in
> moran.test() for E(I) (var(I) is similar):
>
>      EI <- (-1)/wc$n1
>
> where wc$n1 is n - 1 from spweights.constants(). In the lm.morantest()
> case:
>
>      XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE])
>      X <- model.matrix(terms(model), model.frame(model))
>      if (length(nacoefs) > 0L)
>          X <- X[, -nacoefs]
>      if (!is.null(wts <- weights(model))) {
>          X <- drop(t(sapply(1:length(wts), function(i) sqrt(wts[i]) *
>              X[i, ])))
>      }
>      Z <- lag.listw(listw.U, X, zero.policy = zero.policy)
>      C1 <- t(X) %*% Z
>      trA <- (sum(diag(XtXinv %*% C1)))
>      EI <- -((N * trA)/((N - p) * S0))
>
> and S_0 is the sum of weights (N if row standardised), p is the number
> columns of X, and A = (X'X)^{-1} X'WX. For the distance band case, you
> also need to adjust N for the count of pairs of neighbours within that
> band; in moran.test() you see n <- as.double(length(which(cards > 0))) by
> default, where cards is a vector of neighbour counts; in lm.morantest()
> the equivalent by default is
>   N <- as.double(length(which(card(listw$neighbours) > 0L)))
>
> My presumption is that ncf::correlog() is for input variables, not
> regression residuals, as there is no argument to pass though the fitted
> model object from which to extract the model matrix or (X'X)^{-1}.
> Further, resampling regression residuals may be unsafe for similar
> reasons; the draws should not be from the residuals, I think.
>
> Recall that the code is shown by typing the function name, and this best
> documents what is going on.
>
> Again, refer to Cliff & Ord 1973 for the full development.
>
> Roger
>
>
> >
> > Thank you
> >
> > Best,
> >
> > Vinicius
> >
> > Em qui., 23 de jul. de 2020 às 13:52, Roger Bivand <Roger.Bivand using nhh.no>
> > escreveu:
> >
> >> On Wed, 22 Jul 2020, Peter B. Pearman wrote:
> >>
> >>>
> >>> Dear Roger and list members,
> >>>
> >>> I have a ols regression and want to remove spatial autocorrelation
> (SAC)
> >>> from the residuals, in order to avoid its potential effects of SAC on
> >>> the hypothesis tests (and the reviewers/editor).  I have generated
> >>> spatial eigenvectors with SpatialFiltering(), and added the generated
> >>> vectors to the regression.  Surprisingly, SAC appears to become more
> >>> pronounced.  I also tried ME(), but many more vectors are produced and
> >>> SAC is also not removed. Isn't including the vectors from
> >>> SpatialFiltering() supposed to reduce SAC?
> >>>
> >>> Can you please enlighten me as to what's going on, what I am doing
> >>> wrong, or what I should try?
> >>
> >> You are using point support (your forests are at points not areas, so
> that
> >> the observations are not contiguous). The advice to consider this as a
> >> mixed model problem may be relevant. It is certainly the case that
> >> ncf::correlog() should not be used for regression residuals.
> >>
> >> Further, your data are in a band 1.5-4.3E, 43.0-43.4N, so using 0.7
> >> degrees in dnearneigh() was a risky choice, and squashes the data. If
> you
> >> coerce to an sf object and apply an appropriate CRS, you get many fewer
> >> neighbours on average.
> >>
> >> library(sf)
> >> data <- st_as_sf(data, coords=c("LONG", "LAT"))
> >> (nbnear4 <- dnearneigh(data, 0, 0.7))
> >> st_crs(data) <- 4326
> >> (nbnear4a <- dnearneigh(data, 0, 27))
> >> # or project to a relevant planar UTM 31N spec.
> >> data_utm31 <- st_transform(data, st_crs(32631))
> >> (nbd <- dnearneigh(data_utm31, 0, 27))
> >>
> >> and so on. SpatialFiltering() and ME() test against global residual
> >> autocorrelation in finding candidate eigenvectors, and running
> >> lm.morantest() on the fitted model shows that there is no global
> >> autocorrelation as intended (because positive and negative local
> >> autocorrelation cancels out). I think that the underlying problems are
> >> mixing approaches that are not asking the same questions, and in too
> >> little care in choosing the weights.
> >>
> >> Hope this clarifies,
> >>
> >> Roger
> >>
> >>>
> >>> Thanks in advance for you time.
> >>>
> >>> Peter
> >>>
> >>> The data are here:
> >>>
> >>>
> >>
> https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=
> >>> sharing
> >>>
> >>> The analysis goes like this:
> >>>
> >>> library(spatialreg)
> >>> library(spdep)
> >>> library(tidyverse)
> >>> library(car)
> >>> library(ncf)
> >>>
> >>>
> >>> data <- read_csv("for_RAR.csv")
> >>> set.seed(12345)
> >>> x <- data$ses.mntd
> >>> y <- log(data$RAR)
> >>> ols_for_RAR <- lm(y ~ x)
> >>> ## qqplot() and shapiro.test() show residuals are nicely distributed
> >>> # x is significant and R-square about 0.2, demonstrated here
> >>> car::Anova(ols_for_RAR,type="III")
> >>> summary(ols_for_RAR)
> >>>
> >>> # the following appears to make an acceptable neighbor network
> >>> c1<-c(data$LONG)
> >>> c2<-c(data$LAT)
> >>> cbindForests<-cbind(c1,c2)
> >>> # a value of 0.7, below, is sufficient to join all the points.
> >>> # qualitatively the results aren't affected, as far as I see by setting
> >> this
> >>> higher
> >>> # However, the number of eigenvectors generated by SpatialFiltering()
> >> varies
> >>> a lot
> >>> nbnear4 <- dnearneigh(cbindForests, 0, 0.7)
> >>> plot(nbnear4, cbindForests, col = "red", pch = 20)
> >>>
> >>> # SAC appears significant at short distances (<10km), which is what I
> >> want
> >>> to remove
> >>> cor_for <- correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
> >> resamp =
> >>> 1000, latlon=TRUE, na.rm = TRUE)
> >>> plot(cor_for$correlation[1:20],type="s")
> >>> # p-values
> >>> print(cor_for$p[which(cor_for$p < 0.05)])
> >>> # Moran's I values
> >>> cor_for$correlation[which(cor_for$p < 0.05)]
> >>>
> >>> # Generate optimized spatial eigenvectors using SpatialFiltering() and
> >> use
> >>> them
> >>> # Several vectors are generated depending on values in dnearneigh()
> >>> spfilt_mntd_RAR<- spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style
> =
> >>> "W", tol=0.0001, ExactEV = TRUE)
> >>> new_mod <- lm(y ~ x + fitted(spfilt_mntd_RAR))
> >>> car::Anova(new_mod, type="III")
> >>> summary(new_mod)
> >>>
> >>> # Plot Moran's I at distances under 20km
> >>> cor_for_1c <- correlog(c1, c2, residuals(new_mod), increment = 1,
> resamp
> >> =
> >>> 1000, latlon=TRUE, na.rm = TRUE)
> >>> plot(cor_for_1c$correlation[1:20],type="s")
> >>>
> >>> # Extract significant values of Moran's I
> >>> cor_for_1c$p[which(cor_for_1c$p < 0.05)]
> >>> cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]
> >>>
> >>> # The result is that Moran's I is significant at additional short
> >> distances
> >>> --
> >>>
> >>> _+_+_+_+_+_+_+_+_
> >>>
> >>> Peter B. Pearman
> >>> Ikerbasque Research Professor
> >>>
> >>> Laboratory for Computational Ecology and Evolution
> >>> Departamento de Biología Vegetal y Ecología
> >>> Facultad de Ciencias y Tecnología
> >>> Ap. 644
> >>> Universidad del País Vasco/ Euskal Herriko Unibertsitatea
> >>>
> >>> Barrio Sarriena s/n
> >>> 48940 Leioa, Bizkaia
> >>>
> >>> SPAIN
> >>>
> >>> Tel.  +34 94 601 8030
> >>>
> >>> Fax  +34 94 601 3500
> >>>
> >>> www.ehu.eus/es/web/bgppermp
> >>>
> >>>
> >>>
> >>> When you believe in things that you don’t understand
> >>>
> >>> Then you suffer
> >>>
> >>>                         -- Stevie Wonder
> >>>
> >>>
> >>> Download my public encryption key here: https://pgp.mit.edu/
> >>>
> >>> and please use it when you send me e-mail.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> >> https://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> R-sig-Geo using r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list