[R-sig-Geo] Is a simple feature -friendly version of spdep being developed?

Roger Bivand Roger.Bivand at nhh.no
Sat May 13 22:39:10 CEST 2017


On Fri, 12 May 2017, Tiernan Martin wrote:

> Is anyone thinking about creating an adaptation of the `spdep` package that
> expects sf-class inputs and works well in a pipeline?

I assume that "this is not a pipeline" is a joke for insiders (in the 
pipe)?

No, there is your issue on the sfr github repository that is relevant for 
contiguous neighbours, but not beyond that:

https://github.com/edzer/sfr/issues/234

An sf is a data.frame, and as such should "just work", like "Spatial" 
objects have, in formula/data settings. The problem is (of course) the 
weights matrix.

Should it be a list column (each row has a list nesting two lists, first 
indices, second non-zero weights), or remain separate as it has been for 
20 years, or become a column-oriented representation (Matrix package) - a 
nested list like a list column or a listw obeject is row-oriented. I had 
started thinking about using a sparse column-oriented representation, but 
have not implemented functions accepting them instead of listw objects.

I am very cautious about creating classes for data that were data.frame, 
then sf, and then have the weights built-in. In the simple case it would 
work, but all you have to do is re-order the rows and the link between the 
neighbour ids and row order breaks down; the same applies to subsetting.

The problems to solve first are related the workflows, and easiest to look 
at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for 
global and local tests. I think that all or almost all of the NSE verbs 
will cause chaos (mutate, select, group) once weights have been created. 
If there is a way to bind the neighour IDs to the input sf object rows, a 
list column might be possible, but we have to permit multiple such columns 
(Queen, Rook, ...), and ensure that subsetting and row-reordering keep the 
graph relations intact (and their modifications, like row-standardisation)

If you've seen any examples of how tidy relates to graphs (adjacency lists 
are like nb objects), we could learn from that.

How should we go about this technically? spdep is on R-Forge and is happy 
there. Its present functionality has to be maintained as it stands, it has 
too many reverse dependencies to break. Should it be re-written from 
scratch (with more sparse matrices internally for computation)?

Your example creates weights on the fly for a local G* map, but has n=100, 
not say n=90000 (LA census blocks). Using the sf_ geos based methods does 
not use STRtrees, which cut the time needed to find contigious neighbours 
from days to seconds. We ought to pre-compute weights, but this messes up 
the flow of data, because a second lot of data (the weights) have to enter 
the pipe, and be matched with the row-ids.

We'd need proof of concept with realistically sized data sets (not yet NY 
taxis, but maybe later ...). spdep started as spweights, sptests and 
spdep, and the first two got folded into the third when stable. If weights 
are the first thing to go for, sfweights is where to go first (and port 
the STRtree envelope intersections for contiguities). It could define the 
new classes, and tests and modelling would use them.

Thanks for starting this discussion.

Roger

>
> I understand that there is skepticism about the wisdom of adopting the
> “tidyverse” principles throughout the R package ecosystem, and I share the
> concern that an over-reliance on any single paradigm could reduce the
> resilience and diversity of the system as a whole.
>
> That said, I believe that the enthusiastic adoption of the `sf` package and
> the package's connections with widely-used tidyverse packages like `dplyr`
> and `ggplot2` may result in increased demand for sf-friendly spatial
> analysis tools. As an amateur who recently started using R as my primary
> GIS tool, it seems like the tidyverse's preference for dataframes, S3
> objects, list columns, and pipeline workflows would be well-suited to the
> field of spatial analysis. Are there some fundamental reasons why the
> `spdep` tools cannot (or should not) be adapted to the tidyverse "dialect"?
>
> Let me put the question in the context of an actual analysis: in February
> 2017, the pop culture infovis website The Pudding (https://pudding.cool/)
> published an analysis of regional preferences for Oscar-nominated films in
> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
> the author posted a tutorial explaining the method of “regional smoothing”
> used to create the article’s choropleths (
> https://pudding.cool/process/regional_smoothing/).
>
> The method relies on several `spdep` functions (
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R).
> In the code below, I provide reprex with a smaller dataset included in the
> `sf` package:
>
> library(sf)
> library(spdep)
>
> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> Carolina counties
> nc_shp <- as(nc,'Spatial')
>
> coords <- coordinates(nc_shp)
> IDs<-row.names(as(nc_shp, "data.frame"))
>
> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
> nearest neighbors for each county
> knn5 <- include.self(knn5)
>
> localGvalues <- localG(x = as.numeric(nc_shp at data$NWBIR74), listw =
> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
> localGvalues <- round(localGvalues,3)
>
> nc_shp at data$LOCAL_G <- as.numeric(localGvalues)
>
> p1 <- spplot(nc_shp, c('NWBIR74'))
> p2 <- spplot(nc_shp, c('LOCAL_G'))
> plot(p1, split=c(1,1,2,2), more=TRUE)
> plot(p2, split=c(1,2,2,2), more=TRUE)
>
> Here’s what I imagine that would look like in a tidyverse pipeline (please
> note that this code is for illustrative purposes and will not run):
>
> library(tidyverse)
> library(purrr)
> library(sf)
> library(sfdep) # this package doesn't exist (yet)
>
> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>
> nc_g <-
>  nc %>%
>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self =
> TRUE)),  # find the nearest neighbors for each county
>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
> make a list of the neighbors using the binary method
>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> zero.policy = TRUE),  # calculate the G scores
>         LOCAL_G = round(LOCAL_G,3))
>
> We can see that the (hypothetical) tidyverse version reduces the amount of
> intermediate objects and wraps the creation of the G scores into a single
> code chunk with clear steps.
>
> I'd be grateful to hear from the users and developers of the `spdep` and
> `sf` packages about this topic!
>
> Tiernan Martin
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en


More information about the R-sig-Geo mailing list