[R-sig-Geo] Spatial autocorrelation test in a dataset with multiple observations per region

Wed Dec 18 23:26:02 CET 2024

Hello Everyone,

I have one question on the spatial autoregression tests on primary variables or residuals of linear regression models of a dataset that has multiple observations per region. Here I use columbus dataset and moran.test() on a primary variable CRIME as illustration.

I modified the dataset to have only 7 regions, not 49 regions in original dataset, and each region has 7 observations. I used poly2nb() and nb2listw() to create neighbors and weights list of each region. But after examining the list carefully, I found that two observations within the same region have weight 1. This looks not right. I think poly2nb function treats each observation (a row in the dataset) as a region. Then I modified the neighbors and weights list to have weight 0 for two observations that are in the same region. The moran.test results from original listw object (listw0) and modified listw object (listw) have different Z-scores and p-values. The which moran.test is correct, or are both incorrect?

> library(dplyr)
> library(spdep)
>
> # read dataset and modify it: only 7 regions, 7 observations per region.
> columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1], quiet=TRUE) %>%
+   mutate(regionId = rep(LETTERS[1:7], each = 7),
+          geometry = rep(.$geometry[1:7], each = 7))
> columbus %>% as.data.frame() %>% select(regionId,geometry) %>% unique() %>% dim()
[1] 7 2
>
> # neighbor list and weights
> nblist <- poly2nb(as(columbus, "Spatial"))
> listw0 <- listw <-  nb2listw(nblist, zero.policy=T, style="B")
>
> # 1.Moran I test on primary variable CRIME using original weights list
> (m.test <- moran.test(columbus$CRIME, listw0,
+                       randomisation=FALSE, alternative="greater"))

     Moran I test under normality

data:  columbus$CRIME
weights: listw0

Moran I statistic standard deviate = 2.6121, p-value = 0.004499
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
     0.0497371251     -0.0208333333      0.0007298841

>
> # change weights in listw: two observations within the same region have weight 0.
> columbus$ID = seq(nrow(columbus))
> for (i in seq_along(listw$neighbours)){
+   listw$neighbours[[i]] = setdiff(listw$neighbours[[i]],  subset(as.data.frame(columbus), regionId == as.data.frame(columbus)[i, "regionId"])$ID)
+   listw$weights[[i]] = rep(1, length(listw$neighbours[[i]]))
+
+   if (length(listw$neighbours[[i]]) == 0) {
+     listw$neighbours[[i]] = 0L
+     listw$weights[i] = list(NULL)
+   }
+ }
>
> # 2.Moran I test on primary variable CRIME using modified weights list
> (m.test <- moran.test(columbus$CRIME, listw,
+                       randomisation=FALSE, alternative="greater"))

     Moran I test under normality

data:  columbus$CRIME
weights: listw  n reduced by no-neighbour observations

Moran I statistic standard deviate = 1.1726, p-value = 0.1205
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
      0.014619659      -0.024390244       0.001106761

>
> # the neighbors and weights of the first observation
> ## original listw
> card(listw0$neighbours)[[1]]
[1] 20
> listw0$neighbours[[1]]
[1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
> listw0$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> ## modified listw
> card(listw$neighbours)[[1]]
[1] 14
> listw$neighbours[[1]]
[1]  8  9 10 11 12 13 14 15 16 17 18 19 20 21
> listw$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> # overview of listw0 and listw
> listw0
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 1078
Percentage nonzero weights: 44.89796
Average number of links: 22
2 disjoint connected subgraphs

Weights style: B
Weights constants summary:
   n   nn   S0   S1     S2
B 49 2401 1078 2156 110544
> listw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 784
Percentage nonzero weights: 32.65306
Average number of links: 16
7 regions with no links:
43 44 45 46 47 48 49
8 disjoint connected subgraphs

Weights style: B
Weights constants summary:
   n   nn  S0   S1    S2
B 42 1764 784 1568 65856

Thank you in advance and sorry for the inconvenience.

Hongyun Wang

Sr. Data Scientist � Contracted
Bayer AG
hongyun.wang.ext using bayer.com

________________________________

The information contained in this e-mail is for the excl...{{dropped:14}}