[R-sig-Geo] Spatial autocorrelation test in a dataset with multiple observations per region

Hongyun Wang hongyun@w@ng@ext @end|ng |rom b@yer@com
Sat Dec 21 05:03:40 CET 2024


Prof. Bivand,

Thank you for your response. I changed data source and data manipulation methods and copied only codes to this email. I also checked that the modified columbus is still a sf/data.frame object.

My motivation using modified columbus dataset that has multiple observations per region/polygon is to find correct ways to complete following two tasks:

  1.  Creation of neighbors list and weights list.
  2.  Moran’s I test for spatial autocorrelation for: 1) a primary variable, 2) residuals from an estimated linear model.

I looked at nb2blocknb function’s details and examples and thought this function can be used to create blocked up neighbors list in my case, because this modified dataset is like the post codes example described in function’s details. On the other hand, as stated in the documentation: “observations belonging to each unique location are observation neighbours”, and observation ids will be added to the neighbors list as attribute region.id. I think modifying the neighbors lists and weights list is not necessary, because two observations within a unique location can be “observation neighbours”.

In the codes below, I also compared two weights list objects: listw.block and listw0. They are not identical, but they have the same neighbors list and weights list. And when they were used in Moran’s I tests, the tests returned same z-scores/p-values. Is listw.block object created correctly? Can listw0 replace listw.block?

library(spdep)

# read dataset and modify it: only 7 regions, variable observations per region.
columbus <- st_read(system.file("shapes/columbus.gpkg", package="spData")[1], quiet=TRUE)
columbus$regionId <- rep(as.character(1:7), 4:10)
columbus$geom <- rep(columbus$geom[1:7], 4:10)

# check class of object columbus
class(columbus)
columbus_ <- st_make_valid(st_as_sf(columbus, wkt = "geom"))
class(columbus_)
identical(columbus, columbus_)

# create neighbor list using 7 unique regions/polygons
nb7 <- poly2nb(as(unique(columbus[, "geom"]), "Spatial"))

# block up neighbor list for location-less observations
if (all(sort(as.character(attr(nb7, "region.id"))) == sort(unique(as.character(columbus$regionId))))) {
  block.nb <- nb2blocknb(nb7, columbus$regionId)
} else {
  stop("ID tags don't match!")
}

# create spatial weights for neighbor list
(listw.block <-  nb2listw(block.nb, zero.policy=T, style="B"))

# check number of neighbors of each observation
card(listw.block$neighbours)

# the neighbors and weights of the first observation
card(listw.block$neighbours)[[1]]
listw.block$neighbours[[1]]
listw.block$weights[[1]]

# Moran's I test for spatial autocorrelation in a primary variable CRIME
(m.test <- moran.test(columbus$CRIME, listw.block,
                      randomisation=FALSE, alternative="greater"))

# Moran's I test for spatial autocorrelation in residuals from an estimated linear model
crime.lm <- lm(CRIME ~ HOVAL + INC, data = columbus)
(m.test_ <- lm.morantest(crime.lm, listw.block, alternative="greater"))

########################################################################################

# create neighbor list directly from full columbus dataset
nblist <- poly2nb(as(columbus, "Spatial"))
listw0 <- nb2listw(nblist, zero.policy=T, style="B")

# Moran's I test for spatial autocorrelation in a primary variable CRIME
(m.test <- moran.test(columbus$CRIME, listw0,
                      randomisation=FALSE, alternative="greater"))

# Moran's I test for spatial autocorrelation in residuals from an estimated linear model
crime.lm <- lm(CRIME ~ HOVAL + INC, data = columbus)
(m.test_ <- lm.morantest(crime.lm, listw0, alternative="greater"))

# list.block and list0 have same neighbors and weights, although they are not identical
all(card(listw0$neighbours) == card(listw.block$neighbours))
do.call(all, lapply(seq(length(listw0$neighbours)), function(x) all(listw0$neighbours[[x]] == listw.block$neighbours[[x]])))
do.call(all, lapply(seq(length(listw0$neighbours)), function(x) all(listw0$weights[[x]] == listw.block$weights[[x]])))
identical(listw.block, listw0)

Thank you for your help.

Hongyun


From: Roger Bivand <Roger.Bivand using nhh.no>
Date: Thursday, December 19, 2024 at 3:22 AM
To: Hongyun Wang <hongyun.wang.ext using bayer.com>, r-sig-geo using r-project.org <r-sig-geo using r-project.org>
Subject: Re: Spatial autocorrelation test in a dataset with multiple observations per region
[You don't often get email from roger.bivand using nhh.no. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

This is not easy to disambiguate.

First, you have copied from the output to your email, so that replicating your code is contorted - provide the code, either instead of, or in addition to the output.

Second, shapes/columbus.shp is deprecated, soon to be defunct, use shapes/columbus.gpkg.

Third, you use pipes and dplyr, so that examining intermediate objects is not possible; I would be most grateful if you avoided dplyr::mutate, as sf objects are spatial, not general. I do not think that you have created the object you describe. I also fail to grasp your motivation, do you have multiple observations with unknown position within larger enclosing polygons, as in ?nb2blocknb?

Roger

--
Roger Bivand
Emeritus Professor
Norwegian School of Economics
Postboks 3490 Ytre Sandviken, 5045 Bergen, Norway
Roger.Bivand using nhh.no

________________________________________
From: R-sig-Geo <r-sig-geo-bounces using r-project.org> on behalf of Hongyun Wang <hongyun.wang.ext using bayer.com>
Sent: 18 December 2024 23:26
To: r-sig-geo using r-project.org
Subject: [R-sig-Geo] Spatial autocorrelation test in a dataset with multiple observations per region

[You don't often get email from hongyun.wang.ext using bayer.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

Hello Everyone,

I have one question on the spatial autoregression tests on primary variables or residuals of linear regression models of a dataset that has multiple observations per region. Here I use columbus dataset and moran.test() on a primary variable CRIME as illustration.

I modified the dataset to have only 7 regions, not 49 regions in original dataset, and each region has 7 observations. I used poly2nb() and nb2listw() to create neighbors and weights list of each region. But after examining the list carefully, I found that two observations within the same region have weight 1. This looks not right. I think poly2nb function treats each observation (a row in the dataset) as a region. Then I modified the neighbors and weights list to have weight 0 for two observations that are in the same region. The moran.test results from original listw object (listw0) and modified listw object (listw) have different Z-scores and p-values. The which moran.test is correct, or are both incorrect?

> library(dplyr)
> library(spdep)
>
> # read dataset and modify it: only 7 regions, 7 observations per region.
> columbus <- st_read(system.file("shapes/columbus.shp", package="spData")[1], quiet=TRUE) %>%
+   mutate(regionId = rep(LETTERS[1:7], each = 7),
+          geometry = rep(.$geometry[1:7], each = 7))
> columbus %>% as.data.frame() %>% select(regionId,geometry) %>% unique() %>% dim()
[1] 7 2
>
> # neighbor list and weights
> nblist <- poly2nb(as(columbus, "Spatial"))
> listw0 <- listw <-  nb2listw(nblist, zero.policy=T, style="B")
>
> # 1.Moran I test on primary variable CRIME using original weights list
> (m.test <- moran.test(columbus$CRIME, listw0,
+                       randomisation=FALSE, alternative="greater"))

     Moran I test under normality

data:  columbus$CRIME
weights: listw0

Moran I statistic standard deviate = 2.6121, p-value = 0.004499
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
     0.0497371251     -0.0208333333      0.0007298841

>
> # change weights in listw: two observations within the same region have weight 0.
> columbus$ID = seq(nrow(columbus))
> for (i in seq_along(listw$neighbours)){
+   listw$neighbours[[i]] = setdiff(listw$neighbours[[i]],  subset(as.data.frame(columbus), regionId == as.data.frame(columbus)[i, "regionId"])$ID)
+   listw$weights[[i]] = rep(1, length(listw$neighbours[[i]]))
+
+   if (length(listw$neighbours[[i]]) == 0) {
+     listw$neighbours[[i]] = 0L
+     listw$weights[i] = list(NULL)
+   }
+ }
>
> # 2.Moran I test on primary variable CRIME using modified weights list
> (m.test <- moran.test(columbus$CRIME, listw,
+                       randomisation=FALSE, alternative="greater"))

     Moran I test under normality

data:  columbus$CRIME
weights: listw  n reduced by no-neighbour observations


Moran I statistic standard deviate = 1.1726, p-value = 0.1205
alternative hypothesis: greater
sample estimates:
Moran I statistic       Expectation          Variance
      0.014619659      -0.024390244       0.001106761

>
> # the neighbors and weights of the first observation
> ## original listw
> card(listw0$neighbours)[[1]]
[1] 20
> listw0$neighbours[[1]]
[1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21
> listw0$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> ## modified listw
> card(listw$neighbours)[[1]]
[1] 14
> listw$neighbours[[1]]
[1]  8  9 10 11 12 13 14 15 16 17 18 19 20 21
> listw$weights[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> # overview of listw0 and listw
> listw0
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 1078
Percentage nonzero weights: 44.89796
Average number of links: 22
2 disjoint connected subgraphs

Weights style: B
Weights constants summary:
   n   nn   S0   S1     S2
B 49 2401 1078 2156 110544
> listw
Characteristics of weights list object:
Neighbour list object:
Number of regions: 49
Number of nonzero links: 784
Percentage nonzero weights: 32.65306
Average number of links: 16
7 regions with no links:
43 44 45 46 47 48 49
8 disjoint connected subgraphs

Weights style: B
Weights constants summary:
   n   nn  S0   S1    S2
B 42 1764 784 1568 65856

Thank you in advance and sorry for the inconvenience.

Hongyun Wang

Sr. Data Scientist � Contracted
Bayer AG
hongyun.wang.ext using bayer.com

________________________________

The information contained in this e-mail is for the excl...{{dropped:14}}

________________________________

The information contained in this e-mail is for the exclusive use of the intended recipient(s) and may be confidential, proprietary, and/or legally privileged.  Inadvertent disclosure of this message does not constitute a waiver of any privilege.  If you receive this message in error, please do not directly or indirectly use, print, copy, forward, or disclose any part of this message.  Please also delete this e-mail and all copies and notify the sender.  Thank you.

________________________________


	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list