[R-sig-Geo] rgdal::writeOGR with driver='ESRI Shapefile' converts Polygon object into a hole

Roger Bivand Roger@B|v@nd @end|ng |rom nhh@no
Tue Aug 20 11:55:00 CEST 2019


Good, thanks very much!

Roger

On Mon, 19 Aug 2019, Phil Haines wrote:

> Hi Roger,
>
> I have added an example to https://github.com/ateucher/rmapshaper/issues/89
>
> Hopefully it illustrates the behaviour I was seeing from
> rmapshaper::ms_simplify() but please let me know if not.
>
> And thanks for the nudge towards GPKG, I will investigate.
> Phil
>
> On Mon, 19 Aug 2019 at 15:48, Roger Bivand <Roger.Bivand using nhh.no> wrote:
>>
>> Hi Phil,
>>
>> On Sun, 18 Aug 2019, Phil Haines wrote:
>>
>>> Hi Roger,
>>>
>>> I originally encountered this issue while trying to reduce the size of
>>> German postal code boundaries. I used rmapshaper::ms_simplify() which
>>> introduced the polygons sharing a common edge. I definitely didn't
>>> need them to be separate polygons, and would happily have merged them
>>> had I been more familiar with gUnaryUnion().
>>
>> If the German postal code boundaries are open, could you please put (an
>> affected subset) on an URL, and provide a short script to replicate the
>> workflow. Then we can engage with the rmapshaper maintainer, and see
>> whether the underlying problem in the workflow is in the js library or its
>> R wrapper. I've opened: https://github.com/ateucher/rmapshaper/issues/89.
>>
>>>
>>> I apologise if my question has resulted in unnecessary effort on your
>>> part. I reported the behaviour because I found it surprising. However,
>>> I now understand that this example is not a valid simple feature
>>> geometry and that the solution is to take steps to ensure I have valid
>>> geometries, and to repair if not.
>>>
>>
>> No need to apologise!! Your reproducible example was excellent, without it
>> you wouldn't have contributed to getting the internal shapelib code
>> changed to make it less restrictive in future releases of GDAL,
>> potentially benefitting lots of users (who really should drop ESRI
>> shapefiles, and/or should check geometries for validity, but ... whose
>> work you've helped save). By the way, ESRI would be very happy if
>> shapefiles stopped being produced in current workflows, and that new files
>> should rather be GPKG.
>>
>>> Additionally I must confess that I only use the ESRI Shapefile driver
>>> because I don't know any better... My use cases are reading and
>>> writing from R (usually to produce maps) and occasionally opening in
>>> QGIS. I can happily switch to any format that supports this workflow.
>>>
>>
>> GPKG is now very broadly supported, is SF-compliant, and has the big
>> user-facing benefits of not having field name length restrictions, and
>> many fewer encoding issues for field names and string values.
>>
>> Best wishes,
>>
>> Roger
>>
>>> Thank you very much for your time,
>>> Phil
>>>
>>> On Sun, 18 Aug 2019 at 13:59, Roger Bivand <Roger.Bivand using nhh.no> wrote:
>>>>
>>>> The reason that the problem occurred is that a MULTIPOLYGON with two
>>>> exterior rings becomes invalid if the exterior rings touch along an edge
>>>> (this case). It is important to know the use case, to see whether:
>>>>
>>>> library(rgeos)
>>>> writeWKT(Ps1)
>>>> gIsValid(Ps1, reason=TRUE)
>>>> Ps1a <- gUnaryUnion(gBuffer(Ps1, width=0))
>>>> gIsValid(Ps1a, reason=TRUE)
>>>> writeWKT(Ps1a)
>>>>
>>>> or equivalently in sf for sf objects, should be applied before trying to
>>>> write the object out to file with this driver.
>>>>
>>>> However, because drivers that are compliant with the simple features
>>>> standard (which bans exterior rings sharing edges) have been permissive
>>>> and do round-trip this invalid object, a relaxation in the OGR ESRI
>>>> shapefile driver has been provided and may be included in a future
>>>> release.
>>>>
>>>> We need to know (see these issues):
>>>>
>>>> https://github.com/r-spatial/sf/issues/1130
>>>> https://github.com/OSGeo/gdal/issues/1787
>>>>
>>>> why it was desirable to write out this object using this driver? Could an
>>>> alternative driver have been used, or is ESRI shapefile the only format
>>>> used in the workflow?
>>>>
>>>> If it has to be this driver, could the workflow be changed to repair
>>>> degenerate cases before writing? If using sp classes, rgeos may be used to
>>>> test for and probably repair such geometries before they reach
>>>> rgdal::writeOGR() for this driver. Adding code to sf and rgdal to trap
>>>> degenerate cases does encumber all users with valid geometries with
>>>> the time wasted on extra checking, so building checks into sf and rgdal is
>>>> not desirable.
>>>>
>>>> I hope it is possible to find out more about the use case quickly, to pass
>>>> on to GDAL developers to help motivate a relaxation in their current
>>>> policy with regard to this driver, and to encourage them to include the
>>>> fix branch in a future release of GDAL.
>>>>
>>>> Roger
>>>>
>>>> On Sat, 17 Aug 2019, Roger Bivand wrote:
>>>>
>>>>> Please follow up both here and on:
>>>>>
>>>>> https://github.com/r-spatial/sf/issues/1130
>>>>>
>>>>> as the problem is also seen in the sf package using the same GDAL ESRI
>>>>> Shapefile driver.
>>>>>
>>>>> Roger
>>>>>
>>>>> On Fri, 16 Aug 2019, Roger Bivand wrote:
>>>>>
>>>>>>  On Fri, 16 Aug 2019, Roger Bivand wrote:
>>>>>>
>>>>>>>   On Tue, 13 Aug 2019, Phil Haines wrote:
>>>>>>>
>>>>>>>>    Dear list,
>>>>>>>>
>>>>>>>>    I have a single Polygons object containing multiple Polygon objects
>>>>>>>>    that share a common border. When I output this using writeOGR() one of
>>>>>>>>    the Polygon objects becomes a hole, as the following example shows.
>>>>>>>>
>>>>>>>>    Create a Polygons object containing two adjoining Polygon objects
>>>>>>>>>    library(rgdal)
>>>>>>>>>    r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
>>>>>>>>>    r2 <- r1; r2[,1] <- r2[,1]+1
>>>>>>>>>    Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
>>>>>>>>>    SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
>>>>>>>>>    data.frame(Example=c("Minimal")))
>>>>>>>>
>>>>>>>>    Perform a write/readOGR() cycle
>>>>>>>>>    fn <- tempfile()
>>>>>>>>>    writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
>>>>>>>>>    SPDF2 <- readOGR(dsn=fn,layer='test')
>>>>>>>>
>>>>>>>>    Second Polygon object is now a hole
>>>>>>>>>    sapply(SPDF2 using polygons[[1]]@Polygons,slot,"hole")
>>>>>>>>    [1] FALSE  TRUE
>>>>>>>>
>>>>>>>>    I see from the sp documentation that "Polygon objects belonging to a
>>>>>>>>    Polygons object should either not overlap one-other, or should be
>>>>>>>>    fully included" but I am not sure how this relates to bordering
>>>>>>>>    Polygon objects. I would welcome any advice as to whether what I am
>>>>>>>>    asking of writeOGR is reasonable?
>>>>>>>
>>>>>>>   The problem is with the 'ESRI Shapefile' representation and driver:
>>>>>>>
>>>>>>>   library(rgdal)
>>>>>>>   r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
>>>>>>>   r2 <- r1; r2[,1] <- r2[,1]+1
>>>>>>>   Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
>>>>>>>   SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
>>>>>>>   data.frame(Example=c("Minimal")))
>>>>>>>   sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>   sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>>>
>>>>>>>   # which constructs a MULTIPOLYGON object:
>>>>>>>
>>>>>>>   rgeos::writeWKT(SPDF)
>>>>>>>   library(sf)
>>>>>>>   st_as_text(st_geometry(st_as_sf(SPDF)))
>>>>>>>
>>>>>>> #    The 'ESRI Shapefile' driver is not Simple-Feature compliant (it
>>>>>>> #    pre-dates it), so the failure occurs by seeing the second exterior
>>>>>>> #    ring
>>>>>>> #    as an interior ring
>>>>>>>
>>>>>>>   fn <- tempfile()
>>>>>>>   writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
>>>>>>>   SPDF2 <- readOGR(dsn=fn, layer='test')
>>>>>>>   sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>   sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>>>   rgeos::writeWKT(SPDF2)
>>>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2)))
>>>>>>>
>>>>>>>   # This happens with sf too, using the same GDAL driver:
>>>>>>>
>>>>>>>   sf2 <- st_read(dsn=fn, layer='test')
>>>>>>>   st_geometry(sf2)
>>>>>>>   library(sf)
>>>>>>>   st_as_text(st_geometry(st_as_sf(sf2)))
>>>>>>>   rgeos::writeWKT(as(sf2, "Spatial"))
>>>>>>>
>>>>>>>   # Adding the comment fix doesn't help:
>>>>>>>
>>>>>>>   comment(slot(SPDF, "polygons")[[1]])
>>>>>>>   SPDF_c <- rgeos::createSPComment(SPDF)
>>>>>>>   comment(slot(SPDF_c, "polygons")[[1]])
>>>>>>>   writeOGR(SPDF_c, fn, layer='test_c', driver='ESRI Shapefile',
>>>>>>>     verbose=TRUE)
>>>>>>>
>>>>>>> #    reports
>>>>>>> #    Object initially classed as: wkbPolygon
>>>>>>> #    SFS comments in Polygons objects
>>>>>>> #    Object reclassed as: wkbMultiPolygon
>>>>>>>
>>>>>>>   SPDF2_c <- readOGR(dsn=fn, layer='test_c')
>>>>>>>   sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>   sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot,
>>>>>>>   "ringDir")
>>>>>>>   rgeos::writeWKT(SPDF2_c)
>>>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2_c)))
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   # If the input object is written out with the GeoPackage driver:
>>>>>>>
>>>>>>>   fn1 <- tempfile(fileext=".gpkg")
>>>>>>>   writeOGR(SPDF, fn1, layer="test", driver='GPKG')
>>>>>>>   sf2a <- st_read(dsn=fn1,layer='test')
>>>>>>>   st_coordinates(st_geometry(sf2a))
>>>>>>>   SPDF2a <- readOGR(dsn=fn1)
>>>>>>>   sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>   sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>>>   rgeos::writeWKT(SPDF2a)
>>>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2a)))
>>>>>>>
>>>>>>>   # the issue is resolved. If we separate the exterior rings:
>>>>>>>
>>>>>>>   r2a <- r1; r2a[,1] <- r2a[,1]+1.00001
>>>>>>>   Ps1a = Polygons(list(Polygon(r1),Polygon(r2a)),ID=1)
>>>>>>>   SPDFa = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1a)),
>>>>>>>   data.frame(Example=c("Minimal")))
>>>>>>>   fna <- tempfile()
>>>>>>>   writeOGR(SPDFa, fna, layer='test', driver='ESRI Shapefile')
>>>>>>>   SPDF2_a <- readOGR(dsn=fna, layer='test')
>>>>>>>   sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>   sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot,
>>>>>>>   "ringDir")
>>>>>>>   rgeos::writeWKT(SPDF2_a)
>>>>>>>   st_as_text(st_geometry(st_as_sf(SPDF2_a)))
>>>>>>>
>>>>>>>   # we are OK as the two exterior rings do not touch.
>>>>>>>
>>>>>>>   # does using sf make a difference?
>>>>>>>
>>>>>>>   fn_s <- tempfile(fileext=".shp")
>>>>>>>   st_write(st_as_sf(SPDF), dsn=fn_s)
>>>>>>>   sf_in <- st_read(fn_s)
>>>>>>>   st_as_text(st_geometry(st_as_sf(sf_in)))
>>>>>>>
>>>>>>>   # No
>>>>>>>
>>>>>>>   fn_s <- tempfile(fileext=".shp")
>>>>>>>   st_write(st_as_sf(SPDF_c), dsn=fn_s)
>>>>>>>   sf_in_c <- st_read(fn_s)
>>>>>>>   st_as_text(st_geometry(st_as_sf(sf_in_c)))
>>>>>>>
>>>>>>>   # nor with the pretend-SF-compliant comment set either.
>>>>>>>
>>>>>>>   So the weakness is in the "ESRI Shapefile" write driver, or possibly in
>>>>>>>   the OGRGeometryFactory::organizePolygons() function in GDAL used in
>>>>>>>   OGR_write() (a C++ function) called by writeOGR(). If sf::st_write()
>>>>>>>   also
>>>>>>>   calls OGRGeometryFactory::organizePolygons(), we'd maybe consider that
>>>>>>>   it
>>>>>>>   has a weakness for the "ESRI Shapefile" driver, but which does not
>>>>>>>   affect
>>>>>>>   SF-compliant drivers.
>>>>>>
>>>>>>  Without the comment set, OGRGeometryFactory::organizePolygons() is used;
>>>>>>  with it set, OGRGeometryFactory::organizePolygons() is not used, because
>>>>>>  the object is declared as two exterior rings. In both cases, we have the
>>>>>>  output object written out and read back in incorrectly with the ESRI
>>>>>>  shapefile driver, but SF-compliant drivers round-trip (in the test
>>>>>>  GeoJSON), correctly.
>>>>>>
>>>>>>  It is likely that the changes made in 2015 to accommodate GeoJSON led to
>>>>>>  this possible regression for the ESRI Shapefile driver. I'm adding this
>>>>>>  geometry to tests/tripup.R and data files; without code changes the hole
>>>>>>  slot is wrong and the ring direction is changes to match, so the
>>>>>>  coordinates change too.
>>>>>>
>>>>>>  Reading using the deprecated maptools::readShapeSpatial() also gets a hole
>>>>>>  in the second external ring. However, writing with deprecated
>>>>>> maptools::  writeSpatialShape() yields a shapefile that when read with
>>>>>> maptools::  readShapeSpatial() gives the correct two exterior ring, no hole
>>>>>>  object. When read with sf::st_read() and rgdal::readOGR(), the object is
>>>>>>  also correct. So the problem definitely lies in rgdal::writeOGR(), and
>>>>>>  sf::st_write() - roundtripping with sf::st_write() and sf::st_read()
>>>>>>  degrades from MULTIPOLYGON to POLYGON with the ESRI Shapefile driver.
>>>>>>
>>>>>>  Roger
>>>>>>
>>>>>>>
>>>>>>>   This is probably related to a similar but inverse problem with the
>>>>>>>   SF-compliant GeoJSON driver in 2015:
>>>>>>>
>>>>>>>   https://stat.ethz.ch/pipermail/r-sig-geo/2015-October/023609.html
>>>>>>>
>>>>>>>   continued the next month in:
>>>>>>>
>>>>>>>   https://stat.ethz.ch/pipermail/r-sig-geo/2015-November/023656.html
>>>>>>>
>>>>>>>   The details are in this SVN diff
>>>>>>>
>>>>>>>   https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=555&r2=571
>>>>>>>
>>>>>>>   up to the end og the list thread, and this one from then until now:
>>>>>>>
>>>>>>>   https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=571&r2=733
>>>>>>>
>>>>>>>   Summary: could you change drivers, or is it really necessary to fix an
>>>>>>>   EOL
>>>>>>>   problem? What is your use case?
>>>>>>>
>>>>>>>>
>>>>>>>>    Thanks in advance for your time,
>>>>>>>
>>>>>>>   Thanks for a complete example,
>>>>>>>
>>>>>>>   Roger
>>>>>>>
>>>>>>>>    Phil
>>>>>>>>
>>>>>>>>>    sessionInfo()
>>>>>>>>    R version 3.5.1 (2018-07-02)
>>>>>>>>    Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>>>>    Running under: Windows >= 8 x64 (build 9200)
>>>>>>>>
>>>>>>>>    Matrix products: default
>>>>>>>>
>>>>>>>>    locale:
>>>>>>>>    [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>>>>>>>>    Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>>>>>>>                            LC_TIME=English_United Kingdom.1252
>>>>>>>>
>>>>>>>>    attached base packages:
>>>>>>>>    [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>>
>>>>>>>>    other attached packages:
>>>>>>>>    [1] rgdal_1.4-4 sp_1.3-1
>>>>>>>>
>>>>>>>>    loaded via a namespace (and not attached):
>>>>>>>>    [1] compiler_3.5.1  tools_3.5.1     yaml_2.2.0      grid_3.5.1
>>>>>>>>    lattice_0.20-35
>>>>>>>>
>>>>>>>>    _______________________________________________
>>>>>>>>    R-sig-Geo mailing list
>>>>>>>>    R-sig-Geo using r-project.org
>>>>>>>>    https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>>>> https://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list