[R-sig-Geo] rgdal::writeOGR with driver='ESRI Shapefile' converts Polygon object into a hole

Andy Teucher @ndy@teucher @end|ng |rom gm@||@com
Wed Aug 21 01:06:12 CEST 2019


Hi Phil and Roger,

To close the loop on this issue, I have confirmed that the mapshaper javascript library (which rmapshaper wraps) can produce invalid polygons (https://github.com/ateucher/rmapshaper/issues/89), and I will address this in a future release of rmapshaper (https://github.com/ateucher/rmapshaper/issues/90). I have also alerted the maintainer of mapshaper and he is going to add functionality to address this as well (https://github.com/mbloch/mapshaper/issues/352).

Thanks for the thorough investigation and bringing this to my attention!

Cheers,
Andy Teucher (rmapshaper package maintainer)

> Date: Tue, 20 Aug 2019 11:55:00 +0200
> From: Roger Bivand <Roger.Bivand using nhh.no>
> To: Phil Haines <phil.haines82 using gmail.com>
> Cc: <r-sig-geo using r-project.org>
> Subject: Re: [R-sig-Geo]  rgdal::writeOGR with driver='ESRI Shapefile'
> 	converts Polygon object into a hole
> Message-ID: <alpine.LFD.2.21.1908201154410.12186 using reclus.nhh.no>
> Content-Type: text/plain; charset="us-ascii"; Format="flowed"
> 
> Good, thanks very much!
> 
> Roger
> 
> On Mon, 19 Aug 2019, Phil Haines wrote:
> 
>> Hi Roger,
>> 
>> I have added an example to https://github.com/ateucher/rmapshaper/issues/89
>> 
>> Hopefully it illustrates the behaviour I was seeing from
>> rmapshaper::ms_simplify() but please let me know if not.
>> 
>> And thanks for the nudge towards GPKG, I will investigate.
>> Phil
>> 
>> On Mon, 19 Aug 2019 at 15:48, Roger Bivand <Roger.Bivand using nhh.no> wrote:
>>> 
>>> Hi Phil,
>>> 
>>> On Sun, 18 Aug 2019, Phil Haines wrote:
>>> 
>>>> Hi Roger,
>>>> 
>>>> I originally encountered this issue while trying to reduce the size of
>>>> German postal code boundaries. I used rmapshaper::ms_simplify() which
>>>> introduced the polygons sharing a common edge. I definitely didn't
>>>> need them to be separate polygons, and would happily have merged them
>>>> had I been more familiar with gUnaryUnion().
>>> 
>>> If the German postal code boundaries are open, could you please put (an
>>> affected subset) on an URL, and provide a short script to replicate the
>>> workflow. Then we can engage with the rmapshaper maintainer, and see
>>> whether the underlying problem in the workflow is in the js library or its
>>> R wrapper. I've opened: https://github.com/ateucher/rmapshaper/issues/89.
>>> 
>>>> 
>>>> I apologise if my question has resulted in unnecessary effort on your
>>>> part. I reported the behaviour because I found it surprising. However,
>>>> I now understand that this example is not a valid simple feature
>>>> geometry and that the solution is to take steps to ensure I have valid
>>>> geometries, and to repair if not.
>>>> 
>>> 
>>> No need to apologise!! Your reproducible example was excellent, without it
>>> you wouldn't have contributed to getting the internal shapelib code
>>> changed to make it less restrictive in future releases of GDAL,
>>> potentially benefitting lots of users (who really should drop ESRI
>>> shapefiles, and/or should check geometries for validity, but ... whose
>>> work you've helped save). By the way, ESRI would be very happy if
>>> shapefiles stopped being produced in current workflows, and that new files
>>> should rather be GPKG.
>>> 
>>>> Additionally I must confess that I only use the ESRI Shapefile driver
>>>> because I don't know any better... My use cases are reading and
>>>> writing from R (usually to produce maps) and occasionally opening in
>>>> QGIS. I can happily switch to any format that supports this workflow.
>>>> 
>>> 
>>> GPKG is now very broadly supported, is SF-compliant, and has the big
>>> user-facing benefits of not having field name length restrictions, and
>>> many fewer encoding issues for field names and string values.
>>> 
>>> Best wishes,
>>> 
>>> Roger
>>> 
>>>> Thank you very much for your time,
>>>> Phil
>>>> 
>>>> On Sun, 18 Aug 2019 at 13:59, Roger Bivand <Roger.Bivand using nhh.no> wrote:
>>>>> 
>>>>> The reason that the problem occurred is that a MULTIPOLYGON with two
>>>>> exterior rings becomes invalid if the exterior rings touch along an edge
>>>>> (this case). It is important to know the use case, to see whether:
>>>>> 
>>>>> library(rgeos)
>>>>> writeWKT(Ps1)
>>>>> gIsValid(Ps1, reason=TRUE)
>>>>> Ps1a <- gUnaryUnion(gBuffer(Ps1, width=0))
>>>>> gIsValid(Ps1a, reason=TRUE)
>>>>> writeWKT(Ps1a)
>>>>> 
>>>>> or equivalently in sf for sf objects, should be applied before trying to
>>>>> write the object out to file with this driver.
>>>>> 
>>>>> However, because drivers that are compliant with the simple features
>>>>> standard (which bans exterior rings sharing edges) have been permissive
>>>>> and do round-trip this invalid object, a relaxation in the OGR ESRI
>>>>> shapefile driver has been provided and may be included in a future
>>>>> release.
>>>>> 
>>>>> We need to know (see these issues):
>>>>> 
>>>>> https://github.com/r-spatial/sf/issues/1130
>>>>> https://github.com/OSGeo/gdal/issues/1787
>>>>> 
>>>>> why it was desirable to write out this object using this driver? Could an
>>>>> alternative driver have been used, or is ESRI shapefile the only format
>>>>> used in the workflow?
>>>>> 
>>>>> If it has to be this driver, could the workflow be changed to repair
>>>>> degenerate cases before writing? If using sp classes, rgeos may be used to
>>>>> test for and probably repair such geometries before they reach
>>>>> rgdal::writeOGR() for this driver. Adding code to sf and rgdal to trap
>>>>> degenerate cases does encumber all users with valid geometries with
>>>>> the time wasted on extra checking, so building checks into sf and rgdal is
>>>>> not desirable.
>>>>> 
>>>>> I hope it is possible to find out more about the use case quickly, to pass
>>>>> on to GDAL developers to help motivate a relaxation in their current
>>>>> policy with regard to this driver, and to encourage them to include the
>>>>> fix branch in a future release of GDAL.
>>>>> 
>>>>> Roger
>>>>> 
>>>>> On Sat, 17 Aug 2019, Roger Bivand wrote:
>>>>> 
>>>>>> Please follow up both here and on:
>>>>>> 
>>>>>> https://github.com/r-spatial/sf/issues/1130
>>>>>> 
>>>>>> as the problem is also seen in the sf package using the same GDAL ESRI
>>>>>> Shapefile driver.
>>>>>> 
>>>>>> Roger
>>>>>> 
>>>>>> On Fri, 16 Aug 2019, Roger Bivand wrote:
>>>>>> 
>>>>>>> On Fri, 16 Aug 2019, Roger Bivand wrote:
>>>>>>> 
>>>>>>>>  On Tue, 13 Aug 2019, Phil Haines wrote:
>>>>>>>> 
>>>>>>>>>   Dear list,
>>>>>>>>> 
>>>>>>>>>   I have a single Polygons object containing multiple Polygon objects
>>>>>>>>>   that share a common border. When I output this using writeOGR() one of
>>>>>>>>>   the Polygon objects becomes a hole, as the following example shows.
>>>>>>>>> 
>>>>>>>>>   Create a Polygons object containing two adjoining Polygon objects
>>>>>>>>>>   library(rgdal)
>>>>>>>>>>   r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
>>>>>>>>>>   r2 <- r1; r2[,1] <- r2[,1]+1
>>>>>>>>>>   Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
>>>>>>>>>>   SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
>>>>>>>>>>   data.frame(Example=c("Minimal")))
>>>>>>>>> 
>>>>>>>>>   Perform a write/readOGR() cycle
>>>>>>>>>>   fn <- tempfile()
>>>>>>>>>>   writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
>>>>>>>>>>   SPDF2 <- readOGR(dsn=fn,layer='test')
>>>>>>>>> 
>>>>>>>>>   Second Polygon object is now a hole
>>>>>>>>>>   sapply(SPDF2 using polygons[[1]]@Polygons,slot,"hole")
>>>>>>>>>   [1] FALSE  TRUE
>>>>>>>>> 
>>>>>>>>>   I see from the sp documentation that "Polygon objects belonging to a
>>>>>>>>>   Polygons object should either not overlap one-other, or should be
>>>>>>>>>   fully included" but I am not sure how this relates to bordering
>>>>>>>>>   Polygon objects. I would welcome any advice as to whether what I am
>>>>>>>>>   asking of writeOGR is reasonable?
>>>>>>>> 
>>>>>>>>  The problem is with the 'ESRI Shapefile' representation and driver:
>>>>>>>> 
>>>>>>>>  library(rgdal)
>>>>>>>>  r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
>>>>>>>>  r2 <- r1; r2[,1] <- r2[,1]+1
>>>>>>>>  Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
>>>>>>>>  SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
>>>>>>>>  data.frame(Example=c("Minimal")))
>>>>>>>>  sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>>  sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>>>> 
>>>>>>>>  # which constructs a MULTIPOLYGON object:
>>>>>>>> 
>>>>>>>>  rgeos::writeWKT(SPDF)
>>>>>>>>  library(sf)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(SPDF)))
>>>>>>>> 
>>>>>>>> #    The 'ESRI Shapefile' driver is not Simple-Feature compliant (it
>>>>>>>> #    pre-dates it), so the failure occurs by seeing the second exterior
>>>>>>>> #    ring
>>>>>>>> #    as an interior ring
>>>>>>>> 
>>>>>>>>  fn <- tempfile()
>>>>>>>>  writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
>>>>>>>>  SPDF2 <- readOGR(dsn=fn, layer='test')
>>>>>>>>  sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>>  sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>>>>  rgeos::writeWKT(SPDF2)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(SPDF2)))
>>>>>>>> 
>>>>>>>>  # This happens with sf too, using the same GDAL driver:
>>>>>>>> 
>>>>>>>>  sf2 <- st_read(dsn=fn, layer='test')
>>>>>>>>  st_geometry(sf2)
>>>>>>>>  library(sf)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(sf2)))
>>>>>>>>  rgeos::writeWKT(as(sf2, "Spatial"))
>>>>>>>> 
>>>>>>>>  # Adding the comment fix doesn't help:
>>>>>>>> 
>>>>>>>>  comment(slot(SPDF, "polygons")[[1]])
>>>>>>>>  SPDF_c <- rgeos::createSPComment(SPDF)
>>>>>>>>  comment(slot(SPDF_c, "polygons")[[1]])
>>>>>>>>  writeOGR(SPDF_c, fn, layer='test_c', driver='ESRI Shapefile',
>>>>>>>>    verbose=TRUE)
>>>>>>>> 
>>>>>>>> #    reports
>>>>>>>> #    Object initially classed as: wkbPolygon
>>>>>>>> #    SFS comments in Polygons objects
>>>>>>>> #    Object reclassed as: wkbMultiPolygon
>>>>>>>> 
>>>>>>>>  SPDF2_c <- readOGR(dsn=fn, layer='test_c')
>>>>>>>>  sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>>  sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot,
>>>>>>>>  "ringDir")
>>>>>>>>  rgeos::writeWKT(SPDF2_c)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(SPDF2_c)))
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  # If the input object is written out with the GeoPackage driver:
>>>>>>>> 
>>>>>>>>  fn1 <- tempfile(fileext=".gpkg")
>>>>>>>>  writeOGR(SPDF, fn1, layer="test", driver='GPKG')
>>>>>>>>  sf2a <- st_read(dsn=fn1,layer='test')
>>>>>>>>  st_coordinates(st_geometry(sf2a))
>>>>>>>>  SPDF2a <- readOGR(dsn=fn1)
>>>>>>>>  sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>>  sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "ringDir")
>>>>>>>>  rgeos::writeWKT(SPDF2a)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(SPDF2a)))
>>>>>>>> 
>>>>>>>>  # the issue is resolved. If we separate the exterior rings:
>>>>>>>> 
>>>>>>>>  r2a <- r1; r2a[,1] <- r2a[,1]+1.00001
>>>>>>>>  Ps1a = Polygons(list(Polygon(r1),Polygon(r2a)),ID=1)
>>>>>>>>  SPDFa = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1a)),
>>>>>>>>  data.frame(Example=c("Minimal")))
>>>>>>>>  fna <- tempfile()
>>>>>>>>  writeOGR(SPDFa, fna, layer='test', driver='ESRI Shapefile')
>>>>>>>>  SPDF2_a <- readOGR(dsn=fna, layer='test')
>>>>>>>>  sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot, "hole")
>>>>>>>>  sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot,
>>>>>>>>  "ringDir")
>>>>>>>>  rgeos::writeWKT(SPDF2_a)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(SPDF2_a)))
>>>>>>>> 
>>>>>>>>  # we are OK as the two exterior rings do not touch.
>>>>>>>> 
>>>>>>>>  # does using sf make a difference?
>>>>>>>> 
>>>>>>>>  fn_s <- tempfile(fileext=".shp")
>>>>>>>>  st_write(st_as_sf(SPDF), dsn=fn_s)
>>>>>>>>  sf_in <- st_read(fn_s)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(sf_in)))
>>>>>>>> 
>>>>>>>>  # No
>>>>>>>> 
>>>>>>>>  fn_s <- tempfile(fileext=".shp")
>>>>>>>>  st_write(st_as_sf(SPDF_c), dsn=fn_s)
>>>>>>>>  sf_in_c <- st_read(fn_s)
>>>>>>>>  st_as_text(st_geometry(st_as_sf(sf_in_c)))
>>>>>>>> 
>>>>>>>>  # nor with the pretend-SF-compliant comment set either.
>>>>>>>> 
>>>>>>>>  So the weakness is in the "ESRI Shapefile" write driver, or possibly in
>>>>>>>>  the OGRGeometryFactory::organizePolygons() function in GDAL used in
>>>>>>>>  OGR_write() (a C++ function) called by writeOGR(). If sf::st_write()
>>>>>>>>  also
>>>>>>>>  calls OGRGeometryFactory::organizePolygons(), we'd maybe consider that
>>>>>>>>  it
>>>>>>>>  has a weakness for the "ESRI Shapefile" driver, but which does not
>>>>>>>>  affect
>>>>>>>>  SF-compliant drivers.
>>>>>>> 
>>>>>>> Without the comment set, OGRGeometryFactory::organizePolygons() is used;
>>>>>>> with it set, OGRGeometryFactory::organizePolygons() is not used, because
>>>>>>> the object is declared as two exterior rings. In both cases, we have the
>>>>>>> output object written out and read back in incorrectly with the ESRI
>>>>>>> shapefile driver, but SF-compliant drivers round-trip (in the test
>>>>>>> GeoJSON), correctly.
>>>>>>> 
>>>>>>> It is likely that the changes made in 2015 to accommodate GeoJSON led to
>>>>>>> this possible regression for the ESRI Shapefile driver. I'm adding this
>>>>>>> geometry to tests/tripup.R and data files; without code changes the hole
>>>>>>> slot is wrong and the ring direction is changes to match, so the
>>>>>>> coordinates change too.
>>>>>>> 
>>>>>>> Reading using the deprecated maptools::readShapeSpatial() also gets a hole
>>>>>>> in the second external ring. However, writing with deprecated
>>>>>>> maptools::  writeSpatialShape() yields a shapefile that when read with
>>>>>>> maptools::  readShapeSpatial() gives the correct two exterior ring, no hole
>>>>>>> object. When read with sf::st_read() and rgdal::readOGR(), the object is
>>>>>>> also correct. So the problem definitely lies in rgdal::writeOGR(), and
>>>>>>> sf::st_write() - roundtripping with sf::st_write() and sf::st_read()
>>>>>>> degrades from MULTIPOLYGON to POLYGON with the ESRI Shapefile driver.
>>>>>>> 
>>>>>>> Roger
>>>>>>> 
>>>>>>>> 
>>>>>>>>  This is probably related to a similar but inverse problem with the
>>>>>>>>  SF-compliant GeoJSON driver in 2015:
>>>>>>>> 
>>>>>>>>  https://stat.ethz.ch/pipermail/r-sig-geo/2015-October/023609.html
>>>>>>>> 
>>>>>>>>  continued the next month in:
>>>>>>>> 
>>>>>>>>  https://stat.ethz.ch/pipermail/r-sig-geo/2015-November/023656.html
>>>>>>>> 
>>>>>>>>  The details are in this SVN diff
>>>>>>>> 
>>>>>>>>  https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=555&r2=571
>>>>>>>> 
>>>>>>>>  up to the end og the list thread, and this one from then until now:
>>>>>>>> 
>>>>>>>>  https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=571&r2=733
>>>>>>>> 
>>>>>>>>  Summary: could you change drivers, or is it really necessary to fix an
>>>>>>>>  EOL
>>>>>>>>  problem? What is your use case?
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>   Thanks in advance for your time,
>>>>>>>> 
>>>>>>>>  Thanks for a complete example,
>>>>>>>> 
>>>>>>>>  Roger
>>>>>>>> 
>>>>>>>>>   Phil
>>>>>>>>> 
>>>>>>>>>>   sessionInfo()
>>>>>>>>>   R version 3.5.1 (2018-07-02)
>>>>>>>>>   Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>>>>>   Running under: Windows >= 8 x64 (build 9200)
>>>>>>>>> 
>>>>>>>>>   Matrix products: default
>>>>>>>>> 
>>>>>>>>>   locale:
>>>>>>>>>   [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>>>>>>>>>   Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>>>>>>>>                           LC_TIME=English_United Kingdom.1252
>>>>>>>>> 
>>>>>>>>>   attached base packages:
>>>>>>>>>   [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>>>> 
>>>>>>>>>   other attached packages:
>>>>>>>>>   [1] rgdal_1.4-4 sp_1.3-1
>>>>>>>>> 
>>>>>>>>>   loaded via a namespace (and not attached):
>>>>>>>>>   [1] compiler_3.5.1  tools_3.5.1     yaml_2.2.0      grid_3.5.1
>>>>>>>>>   lattice_0.20-35
>>>>>>>>> 
>>>>>>>>>   _______________________________________________
>>>>>>>>>   R-sig-Geo mailing list
>>>>>>>>>   R-sig-Geo using r-project.org
>>>>>>>>>   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> --
>>>>> Roger Bivand
>>>>> Department of Economics, Norwegian School of Economics,
>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>>>>> https://orcid.org/0000-0003-2392-6140
>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>> 
>>> 
>>> --
>>> Roger Bivand
>>> Department of Economics, Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
>>> https://orcid.org/0000-0003-2392-6140
>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>> 
> 
> -- 
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en



More information about the R-sig-Geo mailing list