[R-sig-Geo] Spatial autocorrelation test in a dataset with multiple observations per region
Hongyun Wang
hongyun@w@ng@ext @end|ng |rom b@yer@com
Wed Dec 18 23:26:02 CET 2024
Hello Everyone,
I have one question on the spatial autoregression tests on primary variables or residuals of linear regression models of a dataset that has multiple observations per region. Here I use columbus dataset and moran.test() on a primary variable CRIME as illustration.
I modified the dataset to have only 7 regions, not 49 regions in original dataset, and each region has 7 observations. I used poly2nb() and nb2listw() to create neighbors and weights list of each region. But after examining the list carefully, I found that two observations within the same region have weight 1. This looks not right. I think poly2nb function treats each observation (a row in the dataset) as a region. Then I modified the neighbors and weights list to have weight 0 for two observations that are in the same region. The moran.test results from original listw object (listw0) and modified listw object (listw) have different Z-scores and p-values. The which moran.test is correct, or are both incorrect?
> library(dplyr)
> library(spdep)
>
> # read dataset and modify it: only 7 regions, 7 observations per region.
> columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1], quiet=TRUE) %>%
+ mutate(regionId = rep(LETTERS[1:7], each = 7),
+ geometry = rep(.$geometry[1:7], each = 7))
> columbus %>% as.data.frame() %>% select(regionId,geometry) %>% unique() %>% dim()
[1] 7 2
>
> # neighbor list and weights
> nblist <- poly2nb(as(columbus, "Spatial"))
> listw0 <- listw <- nb2listw(nblist, zero.policy=T, style="B")
>
> # 1.Moran I test on primary variable CRIME using original weights list
> (m.test <- moran.test(columbus$CRIME, listw0,
+ randomisation=FALSE, alternative="greater"))
Moran I test under normality
data: columbus$CRIME
weights: listw0
Moran I statistic standard deviate = 2.6121, p-value = 0.004499
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.0497371251 -0.0208333333 0.0007298841
>
> # change weights in listw: two observations within the same region have weight 0.
> columbus$ID = seq(nrow(columbus))
> for (i in seq_along(listw$neighbours)){
+ listw$neighbours[[i]] = setdiff(listw$neighbours[[i]], subset(as.data.frame(columbus), regionId == as.data.frame(columbus)[i, "regionId"])$ID)
+ listw$weights[[i]] = rep(1, length(listw$neighbours[[i]]))
+
+ if (length(listw$neighbours[[i]]) == 0) {
+ listw$neighbours[[i]] = 0L
+ listw$weights[i] = list(NULL)
+ }
+ }
>
> # 2.Moran I test on primary variable CRIME using modified weights list
> (m.test <- moran.test(columbus$CRIME, listw,
+ randomisation=FALSE, alternative="greater"))
Moran I test under normality
data: columbus$CRIME
weights: listw n reduced by no-neighbour observations
Moran I statistic standard deviate = 1.1726, p-value = 0.1205
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.014619659 -0.024390244 0.001106761
>
> # the neighbors and weights of the first observation
> ## original listw
> card(listw0$neighbours)[[1]]
[1] 20
> listw0$neighbours[[1]]
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
> listw0$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> ## modified listw
> card(listw$neighbours)[[1]]
[1] 14
> listw$neighbours[[1]]
[1] 8 9 10 11 12 13 14 15 16 17 18 19 20 21
> listw$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> # overview of listw0 and listw
> listw0
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 1078
Percentage nonzero weights: 44.89796
Average number of links: 22
2 disjoint connected subgraphs
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 49 2401 1078 2156 110544
> listw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 784
Percentage nonzero weights: 32.65306
Average number of links: 16
7 regions with no links:
43 44 45 46 47 48 49
8 disjoint connected subgraphs
Weights style: B
Weights constants summary:
n nn S0 S1 S2
B 42 1764 784 1568 65856
Thank you in advance and sorry for the inconvenience.
Hongyun Wang
Sr. Data Scientist � Contracted
Bayer AG
hongyun.wang.ext using bayer.com
________________________________
The information contained in this e-mail is for the excl...{{dropped:14}}
More information about the R-sig-Geo
mailing list