[R-sig-Geo] rgdal::writeOGR with driver='ESRI Shapefile' converts Polygon object into a hole

Roger Bivand Roger@B|v@nd @end|ng |rom nhh@no
Mon Aug 19 16:48:08 CEST 2019


Hi Phil,

On Sun, 18 Aug 2019, Phil Haines wrote:

> Hi Roger,
>
> I originally encountered this issue while trying to reduce the size of
> German postal code boundaries. I used rmapshaper::ms_simplify() which
> introduced the polygons sharing a common edge. I definitely didn't
> need them to be separate polygons, and would happily have merged them
> had I been more familiar with gUnaryUnion().

If the German postal code boundaries are open, could you please put (an 
affected subset) on an URL, and provide a short script to replicate the 
workflow. Then we can engage with the rmapshaper maintainer, and see 
whether the underlying problem in the workflow is in the js library or its 
R wrapper. I've opened: https://github.com/ateucher/rmapshaper/issues/89.

>
> I apologise if my question has resulted in unnecessary effort on your
> part. I reported the behaviour because I found it surprising. However,
> I now understand that this example is not a valid simple feature
> geometry and that the solution is to take steps to ensure I have valid
> geometries, and to repair if not.
>

No need to apologise!! Your reproducible example was excellent, without it 
you wouldn't have contributed to getting the internal shapelib code 
changed to make it less restrictive in future releases of GDAL, 
potentially benefitting lots of users (who really should drop ESRI 
shapefiles, and/or should check geometries for validity, but ... whose 
work you've helped save). By the way, ESRI would be very happy if 
shapefiles stopped being produced in current workflows, and that new files 
should rather be GPKG.

> Additionally I must confess that I only use the ESRI Shapefile driver
> because I don't know any better... My use cases are reading and
> writing from R (usually to produce maps) and occasionally opening in
> QGIS. I can happily switch to any format that supports this workflow.
>

GPKG is now very broadly supported, is SF-compliant, and has the big 
user-facing benefits of not having field name length restrictions, and 
many fewer encoding issues for field names and string values.

Best wishes,

Roger

> Thank you very much for your time,
> Phil
>
> On Sun, 18 Aug 2019 at 13:59, Roger Bivand <Roger.Bivand using nhh.no> wrote:
>>
>> The reason that the problem occurred is that a MULTIPOLYGON with two
>> exterior rings becomes invalid if the exterior rings touch along an edge
>> (this case). It is important to know the use case, to see whether:
>>
>> library(rgeos)
>> writeWKT(Ps1)
>> gIsValid(Ps1, reason=TRUE)
>> Ps1a <- gUnaryUnion(gBuffer(Ps1, width=0))
>> gIsValid(Ps1a, reason=TRUE)
>> writeWKT(Ps1a)
>>
>> or equivalently in sf for sf objects, should be applied before trying to
>> write the object out to file with this driver.
>>
>> However, because drivers that are compliant with the simple features
>> standard (which bans exterior rings sharing edges) have been permissive
>> and do round-trip this invalid object, a relaxation in the OGR ESRI
>> shapefile driver has been provided and may be included in a future
>> release.
>>
>> We need to know (see these issues):
>>
>> https://github.com/r-spatial/sf/issues/1130
>> https://github.com/OSGeo/gdal/issues/1787
>>
>> why it was desirable to write out this object using this driver? Could an
>> alternative driver have been used, or is ESRI shapefile the only format
>> used in the workflow?
>>
>> If it has to be this driver, could the workflow be changed to repair
>> degenerate cases before writing? If using sp classes, rgeos may be used to
>> test for and probably repair such geometries before they reach
>> rgdal::writeOGR() for this driver. Adding code to sf and rgdal to trap
>> degenerate cases does encumber all users with valid geometries with
>> the time wasted on extra checking, so building checks into sf and rgdal is
>> not desirable.
>>
>> I hope it is possible to find out more about the use case quickly, to pass
>> on to GDAL developers to help motivate a relaxation in their current
>> policy with regard to this driver, and to encourage them to include the
>> fix branch in a future release of GDAL.
>>
>> Roger
>>
>> On Sat, 17 Aug 2019, Roger Bivand wrote:
>>
>>> Please follow up both here and on:
>>>
>>> https://github.com/r-spatial/sf/issues/1130
>>>
>>> as the problem is also seen in the sf package using the same GDAL ESRI
>>> Shapefile driver.
>>>
>>> Roger
>>>
>>> On Fri, 16 Aug 2019, Roger Bivand wrote:
>>>
>>>>  On Fri, 16 Aug 2019, Roger Bivand wrote:
>>>>
>>>>>   On Tue, 13 Aug 2019, Phil Haines wrote:
>>>>>
>>>>>>    Dear list,
>>>>>>
>>>>>>    I have a single Polygons object containing multiple Polygon objects
>>>>>>    that share a common border. When I output this using writeOGR() one of
>>>>>>    the Polygon objects becomes a hole, as the following example shows.
>>>>>>
>>>>>>    Create a Polygons object containing two adjoining Polygon objects
>>>>>>>    library(rgdal)
>>>>>>>    r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
>>>>>>>    r2 <- r1; r2[,1] <- r2[,1]+1
>>>>>>>    Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
>>>>>>>    SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
>>>>>>>    data.frame(Example=c("Minimal")))
>>>>>>
>>>>>>    Perform a write/readOGR() cycle
>>>>>>>    fn <- tempfile()
>>>>>>>    writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
>>>>>>>    SPDF2 <- readOGR(dsn=fn,layer='test')
>>>>>>
>>>>>>    Second Polygon object is now a hole
>>>>>>>    sapply(SPDF2 using polygons[[1]]@Polygons,slot,"hole")
>>>>>>    [1] FALSE  TRUE
>>>>>>
>>>>>>    I see from the sp documentation that "Polygon objects belonging to a
>>>>>>    Polygons object should either not overlap one-other, or should be
>>>>>>    fully included" but I am not sure how this relates to bordering
>>>>>>    Polygon objects. I would welcome any advice as to whether what I am
>>>>>>    asking of writeOGR is reasonable?
>>>>>
>>>>>   The problem is with the 'ESRI Shapefile' representation and driver:
>>>>>
>>>>>   library(rgdal)
>>>>>   r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
>>>>>   r2 <- r1; r2[,1] <- r2[,1]+1
>>>>>   Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
>>>>>   SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
>>>>>   data.frame(Example=c("Minimal")))
>>>>>   sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>   sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>
>>>>>   # which constructs a MULTIPOLYGON object:
>>>>>
>>>>>   rgeos::writeWKT(SPDF)
>>>>>   library(sf)
>>>>>   st_as_text(st_geometry(st_as_sf(SPDF)))
>>>>>
>>>>> #    The 'ESRI Shapefile' driver is not Simple-Feature compliant (it
>>>>> #    pre-dates it), so the failure occurs by seeing the second exterior
>>>>> #    ring
>>>>> #    as an interior ring
>>>>>
>>>>>   fn <- tempfile()
>>>>>   writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
>>>>>   SPDF2 <- readOGR(dsn=fn, layer='test')
>>>>>   sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>   sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>   rgeos::writeWKT(SPDF2)
>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2)))
>>>>>
>>>>>   # This happens with sf too, using the same GDAL driver:
>>>>>
>>>>>   sf2 <- st_read(dsn=fn, layer='test')
>>>>>   st_geometry(sf2)
>>>>>   library(sf)
>>>>>   st_as_text(st_geometry(st_as_sf(sf2)))
>>>>>   rgeos::writeWKT(as(sf2, "Spatial"))
>>>>>
>>>>>   # Adding the comment fix doesn't help:
>>>>>
>>>>>   comment(slot(SPDF, "polygons")[[1]])
>>>>>   SPDF_c <- rgeos::createSPComment(SPDF)
>>>>>   comment(slot(SPDF_c, "polygons")[[1]])
>>>>>   writeOGR(SPDF_c, fn, layer='test_c', driver='ESRI Shapefile',
>>>>>     verbose=TRUE)
>>>>>
>>>>> #    reports
>>>>> #    Object initially classed as: wkbPolygon
>>>>> #    SFS comments in Polygons objects
>>>>> #    Object reclassed as: wkbMultiPolygon
>>>>>
>>>>>   SPDF2_c <- readOGR(dsn=fn, layer='test_c')
>>>>>   sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>   sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot,
>>>>>   "ringDir")
>>>>>   rgeos::writeWKT(SPDF2_c)
>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2_c)))
>>>>>
>>>>>
>>>>>
>>>>>   # If the input object is written out with the GeoPackage driver:
>>>>>
>>>>>   fn1 <- tempfile(fileext=".gpkg")
>>>>>   writeOGR(SPDF, fn1, layer="test", driver='GPKG')
>>>>>   sf2a <- st_read(dsn=fn1,layer='test')
>>>>>   st_coordinates(st_geometry(sf2a))
>>>>>   SPDF2a <- readOGR(dsn=fn1)
>>>>>   sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>   sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>   rgeos::writeWKT(SPDF2a)
>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2a)))
>>>>>
>>>>>   # the issue is resolved. If we separate the exterior rings:
>>>>>
>>>>>   r2a <- r1; r2a[,1] <- r2a[,1]+1.00001
>>>>>   Ps1a = Polygons(list(Polygon(r1),Polygon(r2a)),ID=1)
>>>>>   SPDFa = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1a)),
>>>>>   data.frame(Example=c("Minimal")))
>>>>>   fna <- tempfile()
>>>>>   writeOGR(SPDFa, fna, layer='test', driver='ESRI Shapefile')
>>>>>   SPDF2_a <- readOGR(dsn=fna, layer='test')
>>>>>   sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>   sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot,
>>>>>   "ringDir")
>>>>>   rgeos::writeWKT(SPDF2_a)
>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2_a)))
>>>>>
>>>>>   # we are OK as the two exterior rings do not touch.
>>>>>
>>>>>   # does using sf make a difference?
>>>>>
>>>>>   fn_s <- tempfile(fileext=".shp")
>>>>>   st_write(st_as_sf(SPDF), dsn=fn_s)
>>>>>   sf_in <- st_read(fn_s)
>>>>>   st_as_text(st_geometry(st_as_sf(sf_in)))
>>>>>
>>>>>   # No
>>>>>
>>>>>   fn_s <- tempfile(fileext=".shp")
>>>>>   st_write(st_as_sf(SPDF_c), dsn=fn_s)
>>>>>   sf_in_c <- st_read(fn_s)
>>>>>   st_as_text(st_geometry(st_as_sf(sf_in_c)))
>>>>>
>>>>>   # nor with the pretend-SF-compliant comment set either.
>>>>>
>>>>>   So the weakness is in the "ESRI Shapefile" write driver, or possibly in
>>>>>   the OGRGeometryFactory::organizePolygons() function in GDAL used in
>>>>>   OGR_write() (a C++ function) called by writeOGR(). If sf::st_write()
>>>>>   also
>>>>>   calls OGRGeometryFactory::organizePolygons(), we'd maybe consider that
>>>>>   it
>>>>>   has a weakness for the "ESRI Shapefile" driver, but which does not
>>>>>   affect
>>>>>   SF-compliant drivers.
>>>>
>>>>  Without the comment set, OGRGeometryFactory::organizePolygons() is used;
>>>>  with it set, OGRGeometryFactory::organizePolygons() is not used, because
>>>>  the object is declared as two exterior rings. In both cases, we have the
>>>>  output object written out and read back in incorrectly with the ESRI
>>>>  shapefile driver, but SF-compliant drivers round-trip (in the test
>>>>  GeoJSON), correctly.
>>>>
>>>>  It is likely that the changes made in 2015 to accommodate GeoJSON led to
>>>>  this possible regression for the ESRI Shapefile driver. I'm adding this
>>>>  geometry to tests/tripup.R and data files; without code changes the hole
>>>>  slot is wrong and the ring direction is changes to match, so the
>>>>  coordinates change too.
>>>>
>>>>  Reading using the deprecated maptools::readShapeSpatial() also gets a hole
>>>>  in the second external ring. However, writing with deprecated
>>>> maptools::  writeSpatialShape() yields a shapefile that when read with
>>>> maptools::  readShapeSpatial() gives the correct two exterior ring, no hole
>>>>  object. When read with sf::st_read() and rgdal::readOGR(), the object is
>>>>  also correct. So the problem definitely lies in rgdal::writeOGR(), and
>>>>  sf::st_write() - roundtripping with sf::st_write() and sf::st_read()
>>>>  degrades from MULTIPOLYGON to POLYGON with the ESRI Shapefile driver.
>>>>
>>>>  Roger
>>>>
>>>>>
>>>>>   This is probably related to a similar but inverse problem with the
>>>>>   SF-compliant GeoJSON driver in 2015:
>>>>>
>>>>>   https://stat.ethz.ch/pipermail/r-sig-geo/2015-October/023609.html
>>>>>
>>>>>   continued the next month in:
>>>>>
>>>>>   https://stat.ethz.ch/pipermail/r-sig-geo/2015-November/023656.html
>>>>>
>>>>>   The details are in this SVN diff
>>>>>
>>>>>   https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=555&r2=571
>>>>>
>>>>>   up to the end og the list thread, and this one from then until now:
>>>>>
>>>>>   https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=571&r2=733
>>>>>
>>>>>   Summary: could you change drivers, or is it really necessary to fix an
>>>>>   EOL
>>>>>   problem? What is your use case?
>>>>>
>>>>>>
>>>>>>    Thanks in advance for your time,
>>>>>
>>>>>   Thanks for a complete example,
>>>>>
>>>>>   Roger
>>>>>
>>>>>>    Phil
>>>>>>
>>>>>>>    sessionInfo()
>>>>>>    R version 3.5.1 (2018-07-02)
>>>>>>    Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>>    Running under: Windows >= 8 x64 (build 9200)
>>>>>>
>>>>>>    Matrix products: default
>>>>>>
>>>>>>    locale:
>>>>>>    [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>>>>>>    Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>>>>>                            LC_TIME=English_United Kingdom.1252
>>>>>>
>>>>>>    attached base packages:
>>>>>>    [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>>    other attached packages:
>>>>>>    [1] rgdal_1.4-4 sp_1.3-1
>>>>>>
>>>>>>    loaded via a namespace (and not attached):
>>>>>>    [1] compiler_3.5.1  tools_3.5.1     yaml_2.2.0      grid_3.5.1
>>>>>>    lattice_0.20-35
>>>>>>
>>>>>>    _______________________________________________
>>>>>>    R-sig-Geo mailing list
>>>>>>    R-sig-Geo using r-project.org
>>>>>>    https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list