[R-sig-Geo] Is a simple feature -friendly version of spdep being developed?
Tiernan Martin
tiernanmartin at gmail.com
Sun May 14 21:39:53 CEST 2017
You’re right about one thing: those of us introduced to the pipe in our
formative years find it hard to escape (everything is better here inside
the pipe - right?) ;)
Thank you for your very informative response. Before reading it, I had a
vague inkling that matrices were part of the challenge of adapting these
tools to work with simple features, but I also thought perhaps all this
issue needed was a little nudge from a user like me. Your response made it
clear that there is a series of technical decisions that need to be made,
and in order to do that I imagine some exploratory testing is in order. Of
course, it is possible that the data.frame + list-col foundation is just
not well-suited to the needs of spatial weighting/testing tools, but I
suspect that there is some valuable insight to be gained from exploring
this possibility further. And as I stated in my first post, going forward I
expect more users will be interested in seeing this sort of integration
happen.
> How should we go about this technically? spdep is on R-Forge and is happy
> there. Its present functionality has to be maintained as it stands, it
has
> too many reverse dependencies to break. Should it be re-written from
> scratch (with more sparse matrices internally for computation)?
> ...
> We'd need proof of concept with realistically sized data sets (not yet NY
> taxis, but maybe later ...). spdep started as spweights, sptests and
> spdep, and the first two got folded into the third when stable. If
weights
> are the first thing to go for, sfweights is where to go first (and port
> the STRtree envelope intersections for contiguities). It could define the
> new classes, and tests and modelling would use them.
Sounds to me like this package would need to be built from the ground up,
possibly following a similar path to the `sp` development process as you
mentioned. Maybe that presents a challenge with regards to resources (i.e.,
funding, time, etc.). Perhaps project this is a candidate for future
proposals to the R Consortium or other funding sources. I hope that this
post is useful in documenting user interest in seeing the `sf` protocol
integrated into other spatial tools and workflows, and I look forward to
hearing others' thoughts on the matter.
Thanks again,
Tiernan
On Sat, May 13, 2017 at 1:39 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:
> On Fri, 12 May 2017, Tiernan Martin wrote:
>
> > Is anyone thinking about creating an adaptation of the `spdep` package
> that
> > expects sf-class inputs and works well in a pipeline?
>
> I assume that "this is not a pipeline" is a joke for insiders (in the
> pipe)?
>
> No, there is your issue on the sfr github repository that is relevant for
> contiguous neighbours, but not beyond that:
>
> https://github.com/edzer/sfr/issues/234
>
> An sf is a data.frame, and as such should "just work", like "Spatial"
> objects have, in formula/data settings. The problem is (of course) the
> weights matrix.
>
> Should it be a list column (each row has a list nesting two lists, first
> indices, second non-zero weights), or remain separate as it has been for
> 20 years, or become a column-oriented representation (Matrix package) - a
> nested list like a list column or a listw obeject is row-oriented. I had
> started thinking about using a sparse column-oriented representation, but
> have not implemented functions accepting them instead of listw objects.
>
> I am very cautious about creating classes for data that were data.frame,
> then sf, and then have the weights built-in. In the simple case it would
> work, but all you have to do is re-order the rows and the link between the
> neighbour ids and row order breaks down; the same applies to subsetting.
>
> The problems to solve first are related the workflows, and easiest to look
> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
> global and local tests. I think that all or almost all of the NSE verbs
> will cause chaos (mutate, select, group) once weights have been created.
> If there is a way to bind the neighour IDs to the input sf object rows, a
> list column might be possible, but we have to permit multiple such columns
> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep the
> graph relations intact (and their modifications, like row-standardisation)
>
> If you've seen any examples of how tidy relates to graphs (adjacency lists
> are like nb objects), we could learn from that.
>
> How should we go about this technically? spdep is on R-Forge and is happy
> there. Its present functionality has to be maintained as it stands, it has
> too many reverse dependencies to break. Should it be re-written from
> scratch (with more sparse matrices internally for computation)?
>
> Your example creates weights on the fly for a local G* map, but has n=100,
> not say n=90000 (LA census blocks). Using the sf_ geos based methods does
> not use STRtrees, which cut the time needed to find contigious neighbours
> from days to seconds. We ought to pre-compute weights, but this messes up
> the flow of data, because a second lot of data (the weights) have to enter
> the pipe, and be matched with the row-ids.
>
> We'd need proof of concept with realistically sized data sets (not yet NY
> taxis, but maybe later ...). spdep started as spweights, sptests and
> spdep, and the first two got folded into the third when stable. If weights
> are the first thing to go for, sfweights is where to go first (and port
> the STRtree envelope intersections for contiguities). It could define the
> new classes, and tests and modelling would use them.
>
> Thanks for starting this discussion.
>
> Roger
>
> >
> > I understand that there is skepticism about the wisdom of adopting the
> > “tidyverse” principles throughout the R package ecosystem, and I share
> the
> > concern that an over-reliance on any single paradigm could reduce the
> > resilience and diversity of the system as a whole.
> >
> > That said, I believe that the enthusiastic adoption of the `sf` package
> and
> > the package's connections with widely-used tidyverse packages like
> `dplyr`
> > and `ggplot2` may result in increased demand for sf-friendly spatial
> > analysis tools. As an amateur who recently started using R as my primary
> > GIS tool, it seems like the tidyverse's preference for dataframes, S3
> > objects, list columns, and pipeline workflows would be well-suited to the
> > field of spatial analysis. Are there some fundamental reasons why the
> > `spdep` tools cannot (or should not) be adapted to the tidyverse
> "dialect"?
> >
> > Let me put the question in the context of an actual analysis: in February
> > 2017, the pop culture infovis website The Pudding (https://pudding.cool/
> )
> > published an analysis of regional preferences for Oscar-nominated films
> in
> > the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
> > the author posted a tutorial explaining the method of “regional
> smoothing”
> > used to create the article’s choropleths (
> > https://pudding.cool/process/regional_smoothing/).
> >
> > The method relies on several `spdep` functions (
> >
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
> ).
> > In the code below, I provide reprex with a smaller dataset included in
> the
> > `sf` package:
> >
> > library(sf)
> > library(spdep)
> >
> > nc <- st_read(system.file("shape/nc.shp", package = "sf")) # North
> > Carolina counties
> > nc_shp <- as(nc,'Spatial')
> >
> > coords <- coordinates(nc_shp)
> > IDs<-row.names(as(nc_shp, "data.frame"))
> >
> > knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs) # find the
> > nearest neighbors for each county
> > knn5 <- include.self(knn5)
> >
> > localGvalues <- localG(x = as.numeric(nc_shp at data$NWBIR74), listw =
> > nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
> > localGvalues <- round(localGvalues,3)
> >
> > nc_shp at data$LOCAL_G <- as.numeric(localGvalues)
> >
> > p1 <- spplot(nc_shp, c('NWBIR74'))
> > p2 <- spplot(nc_shp, c('LOCAL_G'))
> > plot(p1, split=c(1,1,2,2), more=TRUE)
> > plot(p2, split=c(1,2,2,2), more=TRUE)
> >
> > Here’s what I imagine that would look like in a tidyverse pipeline
> (please
> > note that this code is for illustrative purposes and will not run):
> >
> > library(tidyverse)
> > library(purrr)
> > library(sf)
> > library(sfdep) # this package doesn't exist (yet)
> >
> > nc <- st_read(system.file("shape/nc.shp", package = "sf"))
> >
> > nc_g <-
> > nc %>%
> > mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self
> =
> > TRUE)), # find the nearest neighbors for each county
> > NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')), #
> > make a list of the neighbors using the binary method
> > LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> > zero.policy = TRUE), # calculate the G scores
> > LOCAL_G = round(LOCAL_G,3))
> >
> > We can see that the (hypothetical) tidyverse version reduces the amount
> of
> > intermediate objects and wraps the creation of the G scores into a single
> > code chunk with clear steps.
> >
> > I'd be grateful to hear from the users and developers of the `spdep` and
> > `sf` packages about this topic!
> >
> > Tiernan Martin
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
> Roger.Bivand at nhh.no
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
[[alternative HTML version deleted]]
More information about the R-sig-Geo
mailing list