[R-sig-Geo] rgdal::writeOGR with driver='ESRI Shapefile' converts Polygon object into a hole
Phil Haines
ph||@h@|ne@82 @end|ng |rom gm@||@com
Sun Aug 18 22:33:46 CEST 2019
Hi Roger,
I originally encountered this issue while trying to reduce the size of
German postal code boundaries. I used rmapshaper::ms_simplify() which
introduced the polygons sharing a common edge. I definitely didn't
need them to be separate polygons, and would happily have merged them
had I been more familiar with gUnaryUnion().
I apologise if my question has resulted in unnecessary effort on your
part. I reported the behaviour because I found it surprising. However,
I now understand that this example is not a valid simple feature
geometry and that the solution is to take steps to ensure I have valid
geometries, and to repair if not.
Additionally I must confess that I only use the ESRI Shapefile driver
because I don't know any better... My use cases are reading and
writing from R (usually to produce maps) and occasionally opening in
QGIS. I can happily switch to any format that supports this workflow.
Thank you very much for your time,
Phil
On Sun, 18 Aug 2019 at 13:59, Roger Bivand <Roger.Bivand using nhh.no> wrote:
>
> The reason that the problem occurred is that a MULTIPOLYGON with two
> exterior rings becomes invalid if the exterior rings touch along an edge
> (this case). It is important to know the use case, to see whether:
>
> library(rgeos)
> writeWKT(Ps1)
> gIsValid(Ps1, reason=TRUE)
> Ps1a <- gUnaryUnion(gBuffer(Ps1, width=0))
> gIsValid(Ps1a, reason=TRUE)
> writeWKT(Ps1a)
>
> or equivalently in sf for sf objects, should be applied before trying to
> write the object out to file with this driver.
>
> However, because drivers that are compliant with the simple features
> standard (which bans exterior rings sharing edges) have been permissive
> and do round-trip this invalid object, a relaxation in the OGR ESRI
> shapefile driver has been provided and may be included in a future
> release.
>
> We need to know (see these issues):
>
> https://github.com/r-spatial/sf/issues/1130
> https://github.com/OSGeo/gdal/issues/1787
>
> why it was desirable to write out this object using this driver? Could an
> alternative driver have been used, or is ESRI shapefile the only format
> used in the workflow?
>
> If it has to be this driver, could the workflow be changed to repair
> degenerate cases before writing? If using sp classes, rgeos may be used to
> test for and probably repair such geometries before they reach
> rgdal::writeOGR() for this driver. Adding code to sf and rgdal to trap
> degenerate cases does encumber all users with valid geometries with
> the time wasted on extra checking, so building checks into sf and rgdal is
> not desirable.
>
> I hope it is possible to find out more about the use case quickly, to pass
> on to GDAL developers to help motivate a relaxation in their current
> policy with regard to this driver, and to encourage them to include the
> fix branch in a future release of GDAL.
>
> Roger
>
> On Sat, 17 Aug 2019, Roger Bivand wrote:
>
> > Please follow up both here and on:
> >
> > https://github.com/r-spatial/sf/issues/1130
> >
> > as the problem is also seen in the sf package using the same GDAL ESRI
> > Shapefile driver.
> >
> > Roger
> >
> > On Fri, 16 Aug 2019, Roger Bivand wrote:
> >
> >> On Fri, 16 Aug 2019, Roger Bivand wrote:
> >>
> >>> On Tue, 13 Aug 2019, Phil Haines wrote:
> >>>
> >>>> Dear list,
> >>>>
> >>>> I have a single Polygons object containing multiple Polygon objects
> >>>> that share a common border. When I output this using writeOGR() one of
> >>>> the Polygon objects becomes a hole, as the following example shows.
> >>>>
> >>>> Create a Polygons object containing two adjoining Polygon objects
> >>>>> library(rgdal)
> >>>>> r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
> >>>>> r2 <- r1; r2[,1] <- r2[,1]+1
> >>>>> Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
> >>>>> SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
> >>>>> data.frame(Example=c("Minimal")))
> >>>>
> >>>> Perform a write/readOGR() cycle
> >>>>> fn <- tempfile()
> >>>>> writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
> >>>>> SPDF2 <- readOGR(dsn=fn,layer='test')
> >>>>
> >>>> Second Polygon object is now a hole
> >>>>> sapply(SPDF2 using polygons[[1]]@Polygons,slot,"hole")
> >>>> [1] FALSE TRUE
> >>>>
> >>>> I see from the sp documentation that "Polygon objects belonging to a
> >>>> Polygons object should either not overlap one-other, or should be
> >>>> fully included" but I am not sure how this relates to bordering
> >>>> Polygon objects. I would welcome any advice as to whether what I am
> >>>> asking of writeOGR is reasonable?
> >>>
> >>> The problem is with the 'ESRI Shapefile' representation and driver:
> >>>
> >>> library(rgdal)
> >>> r1 <- rbind(c(1,1),c(1,2),c(2,2),c(2,1),c(1,1))
> >>> r2 <- r1; r2[,1] <- r2[,1]+1
> >>> Ps1 = Polygons(list(Polygon(r1),Polygon(r2)),ID=1)
> >>> SPDF = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1)),
> >>> data.frame(Example=c("Minimal")))
> >>> sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "hole")
> >>> sapply(slot(slot(SPDF, "polygons")[[1]], "Polygons"), slot, "ringDir")
> >>>
> >>> # which constructs a MULTIPOLYGON object:
> >>>
> >>> rgeos::writeWKT(SPDF)
> >>> library(sf)
> >>> st_as_text(st_geometry(st_as_sf(SPDF)))
> >>>
> >>> # The 'ESRI Shapefile' driver is not Simple-Feature compliant (it
> >>> # pre-dates it), so the failure occurs by seeing the second exterior
> >>> # ring
> >>> # as an interior ring
> >>>
> >>> fn <- tempfile()
> >>> writeOGR(SPDF, fn, layer='test', driver='ESRI Shapefile')
> >>> SPDF2 <- readOGR(dsn=fn, layer='test')
> >>> sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "hole")
> >>> sapply(slot(slot(SPDF2, "polygons")[[1]], "Polygons"), slot, "ringDir")
> >>> rgeos::writeWKT(SPDF2)
> >>> st_as_text(st_geometry(st_as_sf(SPDF2)))
> >>>
> >>> # This happens with sf too, using the same GDAL driver:
> >>>
> >>> sf2 <- st_read(dsn=fn, layer='test')
> >>> st_geometry(sf2)
> >>> library(sf)
> >>> st_as_text(st_geometry(st_as_sf(sf2)))
> >>> rgeos::writeWKT(as(sf2, "Spatial"))
> >>>
> >>> # Adding the comment fix doesn't help:
> >>>
> >>> comment(slot(SPDF, "polygons")[[1]])
> >>> SPDF_c <- rgeos::createSPComment(SPDF)
> >>> comment(slot(SPDF_c, "polygons")[[1]])
> >>> writeOGR(SPDF_c, fn, layer='test_c', driver='ESRI Shapefile',
> >>> verbose=TRUE)
> >>>
> >>> # reports
> >>> # Object initially classed as: wkbPolygon
> >>> # SFS comments in Polygons objects
> >>> # Object reclassed as: wkbMultiPolygon
> >>>
> >>> SPDF2_c <- readOGR(dsn=fn, layer='test_c')
> >>> sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot, "hole")
> >>> sapply(slot(slot(SPDF2_c, "polygons")[[1]], "Polygons"), slot,
> >>> "ringDir")
> >>> rgeos::writeWKT(SPDF2_c)
> >>> st_as_text(st_geometry(st_as_sf(SPDF2_c)))
> >>>
> >>>
> >>>
> >>> # If the input object is written out with the GeoPackage driver:
> >>>
> >>> fn1 <- tempfile(fileext=".gpkg")
> >>> writeOGR(SPDF, fn1, layer="test", driver='GPKG')
> >>> sf2a <- st_read(dsn=fn1,layer='test')
> >>> st_coordinates(st_geometry(sf2a))
> >>> SPDF2a <- readOGR(dsn=fn1)
> >>> sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "hole")
> >>> sapply(slot(slot(SPDF2a, "polygons")[[1]], "Polygons"), slot, "ringDir")
> >>> rgeos::writeWKT(SPDF2a)
> >>> st_as_text(st_geometry(st_as_sf(SPDF2a)))
> >>>
> >>> # the issue is resolved. If we separate the exterior rings:
> >>>
> >>> r2a <- r1; r2a[,1] <- r2a[,1]+1.00001
> >>> Ps1a = Polygons(list(Polygon(r1),Polygon(r2a)),ID=1)
> >>> SPDFa = SpatialPolygonsDataFrame( SpatialPolygons(list(Ps1a)),
> >>> data.frame(Example=c("Minimal")))
> >>> fna <- tempfile()
> >>> writeOGR(SPDFa, fna, layer='test', driver='ESRI Shapefile')
> >>> SPDF2_a <- readOGR(dsn=fna, layer='test')
> >>> sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot, "hole")
> >>> sapply(slot(slot(SPDF2_a, "polygons")[[1]], "Polygons"), slot,
> >>> "ringDir")
> >>> rgeos::writeWKT(SPDF2_a)
> >>> st_as_text(st_geometry(st_as_sf(SPDF2_a)))
> >>>
> >>> # we are OK as the two exterior rings do not touch.
> >>>
> >>> # does using sf make a difference?
> >>>
> >>> fn_s <- tempfile(fileext=".shp")
> >>> st_write(st_as_sf(SPDF), dsn=fn_s)
> >>> sf_in <- st_read(fn_s)
> >>> st_as_text(st_geometry(st_as_sf(sf_in)))
> >>>
> >>> # No
> >>>
> >>> fn_s <- tempfile(fileext=".shp")
> >>> st_write(st_as_sf(SPDF_c), dsn=fn_s)
> >>> sf_in_c <- st_read(fn_s)
> >>> st_as_text(st_geometry(st_as_sf(sf_in_c)))
> >>>
> >>> # nor with the pretend-SF-compliant comment set either.
> >>>
> >>> So the weakness is in the "ESRI Shapefile" write driver, or possibly in
> >>> the OGRGeometryFactory::organizePolygons() function in GDAL used in
> >>> OGR_write() (a C++ function) called by writeOGR(). If sf::st_write()
> >>> also
> >>> calls OGRGeometryFactory::organizePolygons(), we'd maybe consider that
> >>> it
> >>> has a weakness for the "ESRI Shapefile" driver, but which does not
> >>> affect
> >>> SF-compliant drivers.
> >>
> >> Without the comment set, OGRGeometryFactory::organizePolygons() is used;
> >> with it set, OGRGeometryFactory::organizePolygons() is not used, because
> >> the object is declared as two exterior rings. In both cases, we have the
> >> output object written out and read back in incorrectly with the ESRI
> >> shapefile driver, but SF-compliant drivers round-trip (in the test
> >> GeoJSON), correctly.
> >>
> >> It is likely that the changes made in 2015 to accommodate GeoJSON led to
> >> this possible regression for the ESRI Shapefile driver. I'm adding this
> >> geometry to tests/tripup.R and data files; without code changes the hole
> >> slot is wrong and the ring direction is changes to match, so the
> >> coordinates change too.
> >>
> >> Reading using the deprecated maptools::readShapeSpatial() also gets a hole
> >> in the second external ring. However, writing with deprecated
> >> maptools:: writeSpatialShape() yields a shapefile that when read with
> >> maptools:: readShapeSpatial() gives the correct two exterior ring, no hole
> >> object. When read with sf::st_read() and rgdal::readOGR(), the object is
> >> also correct. So the problem definitely lies in rgdal::writeOGR(), and
> >> sf::st_write() - roundtripping with sf::st_write() and sf::st_read()
> >> degrades from MULTIPOLYGON to POLYGON with the ESRI Shapefile driver.
> >>
> >> Roger
> >>
> >>>
> >>> This is probably related to a similar but inverse problem with the
> >>> SF-compliant GeoJSON driver in 2015:
> >>>
> >>> https://stat.ethz.ch/pipermail/r-sig-geo/2015-October/023609.html
> >>>
> >>> continued the next month in:
> >>>
> >>> https://stat.ethz.ch/pipermail/r-sig-geo/2015-November/023656.html
> >>>
> >>> The details are in this SVN diff
> >>>
> >>> https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=555&r2=571
> >>>
> >>> up to the end og the list thread, and this one from then until now:
> >>>
> >>> https://r-forge.r-project.org/scm/viewvc.php/pkg/src/OGR_write.cpp?root=rgdal&r1=571&r2=733
> >>>
> >>> Summary: could you change drivers, or is it really necessary to fix an
> >>> EOL
> >>> problem? What is your use case?
> >>>
> >>>>
> >>>> Thanks in advance for your time,
> >>>
> >>> Thanks for a complete example,
> >>>
> >>> Roger
> >>>
> >>>> Phil
> >>>>
> >>>>> sessionInfo()
> >>>> R version 3.5.1 (2018-07-02)
> >>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
> >>>> Running under: Windows >= 8 x64 (build 9200)
> >>>>
> >>>> Matrix products: default
> >>>>
> >>>> locale:
> >>>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
> >>>> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
> >>>> LC_TIME=English_United Kingdom.1252
> >>>>
> >>>> attached base packages:
> >>>> [1] stats graphics grDevices utils datasets methods base
> >>>>
> >>>> other attached packages:
> >>>> [1] rgdal_1.4-4 sp_1.3-1
> >>>>
> >>>> loaded via a namespace (and not attached):
> >>>> [1] compiler_3.5.1 tools_3.5.1 yaml_2.2.0 grid_3.5.1
> >>>> lattice_0.20-35
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-Geo mailing list
> >>>> R-sig-Geo using r-project.org
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: Roger.Bivand using nhh.no
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
More information about the R-sig-Geo
mailing list