[R-sig-Geo] How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

Roger Bivand Roger.Bivand at nhh.no
Tue May 9 20:39:45 CEST 2017


On Tue, 9 May 2017, Edzer Pebesma wrote:

>
>
> On 09/05/17 15:32, Roger Bivand wrote:
>> While https://github.com/edzer/sfr/wiki/migrating is very helpful,
>> rgeos::gUnaryUnion is not the typical use case of sf::st_union. The
>> typical use case, from ASDAR 1st edition, chapter 5 and the maptools
>> "combine_maptools" vignette (the shapefiles are shipped with maptools),
>> is grouping features that should belong to the same statistical entity:
>>
>> library(maptools)
>> vignette("combine_maptools")
>>
>> ###################################################
>> ### chunk number 16:
>> ###################################################
>>
>> library(sf)
>> nc90 <- st_read(system.file("shapes/co37_d90.shp", package = "maptools"))
>> st_crs(nc90) <- "+proj=longlat +datum=NAD27"
>> table(table(paste(nc90$ST, nc90$CO, sep="")))
>>
>> The point is that two counties are represented by multiple features of
>> "POLYGON" objects, rather than single "MULTIPOLYGON" objects, in the
>> input shapefile. I've tried doing:
>>
>> ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
>> nc90a <- st_cast(nc90, to="MULTIPOLYGON", ids=as.integer(ids))
>>
>> but:
>>
>>> dim(nc90a)
>> [1] 104   9
>>
>> all turned into "MULTIPOLYGON", but not grouped, even though I think
>> ids= are as they should be:
>>
>>> table(table(as.integer(ids)))
>>
>>  1  2  4
>> 98  1  1
>
> this is at least documented: nc90 is of class `sf`, and st_cast.sf
> has no ids argument; ... is ignored. If it would merge, it'd need
> some guidance what to do with non-geometry feature attributes.
>

OK, thanks! I was hoping one or other of the many contributors to sf would 
jump in, given the amount of time you commit to moving sf forward!

>>
>> This may be to avoid dropping data.frame rows. It looks as though I can
>> get there using:
>>
>> nc90a <- st_cast(st_geometry(nc90), to="MULTIPOLYGON", ids=as.integer(ids))
>>
>>> length(nc90a)
>> [1] 100
>>
>> but all are "MULTIPOLYGON", not a mixture of "POLYGON" and
>> "MULTIPOLYGON" features, and I've no idea which is which - that is how
>> the order of nc90a relates to the counties of nc90. How do I associate
>> the features in nc90a with their county ids? Where do i find the ids I
>> gave to st_cast in nc90a?
>
> st_cast uses base::split, which converts ids to a factor,
> and returns a list with the order of the levels of that factor
> (alphabetically?). Neither convenient, nor documented. I guess more
> intuitive would be to squeeze, but keep original order, right?
>
> I'd suggest to use instead
>
> nc90a = aggregate(nc90, list(ids = ids), head, n = 1)
>
> which returns a GEOMETRY sf, with 2 MULTIPOLYGON and 98 POLYGON. You
> could st_cast that to MULTIPOLYGON.

This works and keeps the output as c("sf", "data.frame"):

ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
t0 <- aggregate(nc90, list(ids = ids), head, n = 1)
nc90a <- t0[, c("ids", attr(t0, "sf_column"))]
rm(t0)
# n is an argument to head saying take the first value in each ids-group

>
> Alternatively:
>
> library(dplyr)
> bind_cols(nc90, ids=ids) %>%
> 	group_by(ids) %>%
> 	summarise(ST=head(ST,1), CO=head(CO,1), do_union=FALSE)

This works but returns an object of: c("sf", "tbl_df", "tbl", "data.frame"):

library(dplyr)
nc90b <- summarise(group_by(bind_cols(nc90, data.frame(ids=ids)), ids),
   do_union=FALSE)

or equivalently (for released dplyr):

bind_cols(nc90, data.frame(ids=ids)) %>%
         group_by(ids) %>%
         summarise(do_union=FALSE) -> nc90b

This fails in:

all.equal(nc90b, nc90a, check.attributes=FALSE)
# Error in equal_data_frame(target, current, ignore_col_order = 
# ignore_col_order,  :
#  Can't join on 'geometry' x 'geometry' because of incompatible types 
# (sfc_GEOMETRY, sfc / sfc_GEOMETRY, sfc)

but:

> all.equal(nc90a, nc90b, check.attributes=FALSE)
[1] TRUE

so being c("tbl_df", "tbl") before "data.frame" makes a difference.

Since I always try to check intermediate results, I only found the missing 
data.frame() in bind_cols() by stepping through - is that implicit in the 
development version of dplyr (or sf)?

Sometime the maptools vignette will migrate, when I get so far ...

Thanks again,

Roger

>
> converts straight into MULTIPOLYGON.
>
> I'll update the sp -> sf migration wiki.
>
>>
>> What am I missing? I'm writing here rather than raising an issue on GH
>> because others may want to know too, and Edzer needs others to reply for
>> him if they know the answer.
>>
>> Puzzled,
>>
>> Roger
>>
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list