[R-sig-Geo] Exporting Multipolygon geojson with rgdal::writeOGR

Andy Teucher andy.teucher at gmail.com
Mon Nov 2 23:42:32 CET 2015


Amazing (and fast) work Roger, and thanks for the great explanation.
I can see why you wouldn't want to redo the way Polygons objects are
structured - I'm sure it would be break a huge amount of existing
code.

Just curious, does OSX take longer because you have to deal with
multiple possible GDAL versions?

Thanks again,
Andy

On Mon, Nov 2, 2015 at 1:21 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> Hi Andy,
>
> On Mon, 26 Oct 2015, Andy Teucher wrote:
>
>> Hi Roger,
>>
>> Sorry I haven't been much help here - it's not for lack of trying,
>> just lack of skills. I've been poring through OGR_write.cpp and trying
>> to figure out language and the flow of things... a few thoughts
>> occurred to me (which are probably obvious to you):
>>
>
> Your example was very helpful, the problem was that when writeOGR() and the
> underlying compiled code was written, the model used was that of the ESRI
> Shapefile driver, rather than OGC SFS, which was first published in 2004.
> Most older formats (and their drivers in OGR) use a OGR geometry factory
> function to organize polygons on reading - but this is not present in
> shapelib (bundled with the R maptools package), so when designing polygon
> classes for sp, and writing rgdal::writeOGR() about 10 years ago, this was
> not known, and was not revisited subsequently, until you happily spotted the
> bug.
>
> GEOS in rgeos uses OGC SFS definitions (the ones you are using below), but
> the SpatialPolygons/Polygons/Polygon structure was already established by
> then, and a fix (using comment attributes in Polygons objects) was used,
> rather than a major change to Polygons objects (in 2010).
>
> The revision to rgdal::writeOGR() now branches on the input objects - if
> they have comments (applied in code, not user-facing), these are trusted,
> and the MultiPolygon objects constructed containing one or more Polygon
> objects with one exterior ring and zero or more interior rings on that
> basis. However, if the input objects do not have comment attributes,
> OGRGeometryFactory::organizePolygons() (C++) is used to create properly
> organized MultiPolygon objects for export.
>
> This means that the pre-OGC SFS drivers continue to work, but that the OGC
> SFS drivers, like GeoJSON, now also work correctly when the second ring in
> an object is not assumed to be an interior ring (rgdal < 1.1-1). A test set
> for all the possible cases has been introduced.
>
> The revision is now in source package form on CRAN:
>
> https://cran.r-project.org/web/packages/rgdal/index.html
>
> and a Windows binary package will be available later on Tuesday I hope. The
> OSX revision may take a little longer, but we are grateful to have what we
> have anyway.
>
> Thanks for a clear and very helpful bug report.
>
> Best wishes,
>
> Roger
>
>
>> 1) Is this bug because holes are standalone polygons in sp objects,
>> while they are members of polygon objects in geojson, wkt, etc?
>>
>> 2) On a related note, I notice that a single polygon with a hole is
>> identified as a multipolygon containing a single polygon with a hole,
>> when using writeOGR and one of the affected drivers.  Practically I
>> don't think this matters too much, but it's not quite right.
>>
>> 3) In lines 116-160 the polygon/multipolygon and line/multiline
>> determination is made for the entire layer. Can/should this be made
>> for each feature in the layer, as you could have a layer with a
>> mixture of single polygons and multipolygons?
>>
>> 4) I feel like there ought to be one more layer of looping through the
>> polyons. e.g.:
>>
>> loop over each feature;
>>     determine whether each feature is a single polygon (length 1 or
>> all rings but one are holes), or a multipolygon:
>>        if a polygon loop over and add ring(s);
>>        if a multipolygon loop over polygons:
>>            for each polygon add ring(s) and somehow assign holes to
>> parent polygons
>>
>> Again, apologies if I'm being totally obvious (or way off base).
>> Thanks for all your work on these packages, and on this issue.
>>
>> Andy
>>
>>
>> On Thu, Oct 22, 2015 at 11:07 PM, Roger Bivand <Roger.Bivand at nhh.no>
>> wrote:
>>>
>>> On Fri, 23 Oct 2015, Andy Teucher wrote:
>>>
>>>> On my Mac, I am running rgdal 1.0-7 and GDAL 1.11.3
>>>>
>>>>
>>>> sessionInfo()
>>>>
>>>
>>> OK, thanks.
>>>
>>> The problem also affects the SQLite driver (the CRAN binaries do not have
>>> this driver). Could someone please check the PostGIS driver - my guess is
>>> that all OGC SFS compliant drivers may be affected.
>>>
>>> Could somebody also please check whether this is a recently introduced
>>> issue
>>> or not? For example somebody still on rgdal < 1.0? src/OGR_write.cpp was
>>> changed 2015-08-21, but the last previous change was 2015-06-11, then
>>> 2015-05-31, 2014-08-17 ... from the ChangeLog visible on the package CRAN
>>> page.
>>>
>>> Roger
>>>
>>>
>>>>
>>>>
>>>>
>>>>
>>>> R version 3.2.2 (2015-08-14)
>>>>
>>>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>>>
>>>> Running under: OS X 10.11 (El Capitan)
>>>>
>>>>
>>>>
>>>>
>>>> locale:
>>>>
>>>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>>>
>>>>
>>>>
>>>>
>>>> attached base packages:
>>>>
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>>
>>>>
>>>>
>>>> other attached packages:
>>>>
>>>> [1] rgdal_1.0-7 sp_1.2-1
>>>>
>>>>
>>>>
>>>>
>>>> loaded via a namespace (and not attached):
>>>>
>>>> [1] tools_3.2.2     grid_3.2.2      lattice_0.20-33
>>>>
>>>>
>>>>
>>>>
>>>> Andy
>>>>
>>>> On Thu, Oct 22, 2015 at 11:28 AM, Andy Teucher <andy.teucher at gmail.com>
>>>> wrote:
>>>>
>>>>> One more test: WKT has the same issue:
>>>>> library(rgdal)
>>>>> js <- '{
>>>>> "type": "MultiPolygon",
>>>>> "coordinates": [[[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0,
>>>>> 3.0],
>>>>> [102.0, 2.0]]], [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0,
>>>>> 1.0],
>>>>> [100.0, 0.0]]]]
>>>>> } '
>>>>> spdf <- readOGR(js, layer='OGRGeoJSON', verbose=FALSE)
>>>>> writeOGR(spdf, dsn = "test.csv", layer = "test", driver="CSV",
>>>>> layer_options = "GEOMETRY=AS_WKT")
>>>>> cat(readLines("test.csv"), sep = "\n")
>>>>> WKT,FID,
>>>>> "MULTIPOLYGON (((102 2,102 3,103 3,103 2,102 2),(100 0,100 1,101 1,101
>>>>> 0,100 0)))",0
>>>>> It should be:
>>>>> WKT,FID,
>>>>> "MULTIPOLYGON (((102 2,102 3,103 3,103 2,102 2)),((100 0,100 1,101
>>>>> 1,101 0,100 0)))",0
>>>>> Andy
>>>>> On Thu, Oct 22, 2015 at 10:36 AM, Andy Teucher <andy.teucher at gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Thanks very much Roger.
>>>>>>
>>>>>> I've replicated this on my work machine (Windows) and on my Mac at
>>>>>> home.
>>>>>>
>>>>>> KML and GML are both affected. ESRI shapefile works fine and, though
>>>>>> I'm not very familiar with the format, 'MapInfo File' works too
>>>>>> (loading the file in qgis shows a multipart polygon with no geometry
>>>>>> errors).
>>>>>>
>>>>>> sessionInfo() for my Windows machine (using rgdal 1.0-7 with included
>>>>>> gdal drivers):
>>>>>>
>>>>>> R version 3.2.2 (2015-08-14)
>>>>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>>>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252
>>>>>> LC_MONETARY=English_Canada.1252
>>>>>> [4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] rgdal_1.0-7 sp_1.2-0
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] tools_3.2.2     grid_3.2.2      lattice_0.20-33
>>>>>>
>>>>>>
>>>>>> I'll send the details for my Mac environment this evening.
>>>>>>
>>>>>> I'll dig into the source code as soon as I can - C is a completely
>>>>>> foreign language to me, so I can't promise anything fast!
>>>>>>
>>>>>> Thanks again,
>>>>>> Andy
>>>>>>
>>>>>> On Thu, Oct 22, 2015 at 12:27 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 22 Oct 2015, Andy Teucher wrote:
>>>>>>>
>>>>>>>> I’m finding that writeOGR isn’t exporting multipolygons properly
>>>>>>>> using
>>>>>>>> the GeoJSON driver. I have a simple test case (borrowed from here:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://gis.stackexchange.com/questions/137977/writeogr-alters-multipolygon-holes)
>>>>>>>> with a geojson string with one multipolygon containing two polygons.
>>>>>>>> I
>>>>>>>> use readOGR to create a SpatialPolygonsDataFrame out of it, then
>>>>>>>> write
>>>>>>>> it with writeOGR:
>>>>>>>>
>>>>>>>> library(rgdal)
>>>>>>>>
>>>>>>>> js <- '{
>>>>>>>> "type": "MultiPolygon",
>>>>>>>> "coordinates": [[[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0,
>>>>>>>> 3.0],
>>>>>>>> [102.0, 2.0]]], [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0,
>>>>>>>> 1.0],
>>>>>>>> [100.0, 0.0]]]]
>>>>>>>> } '
>>>>>>>>
>>>>>>>> spdf <- readOGR(js, layer='OGRGeoJSON', verbose=FALSE) # Create a
>>>>>>>> SpatialPoygonsDataFrame
>>>>>>>>
>>>>>>>> temp <- tempfile()
>>>>>>>> writeOGR(spdf, dsn = temp, layer = "", driver="GeoJSON")
>>>>>>>> cat(readLines(temp))
>>>>>>>>
>>>>>>>> # Output:
>>>>>>>> { "type": "FeatureCollection", "crs": { "type": "name",
>>>>>>>> "properties":
>>>>>>>> { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [ {
>>>>>>>> "type":
>>>>>>>> "Feature", "id": 0, "properties": { "FID": 0 }, "geometry": {
>>>>>>>> "type":
>>>>>>>> "MultiPolygon", "coordinates": [ [ [ [ 102.0, 2.0 ], [ 102.0, 3.0 ],
>>>>>>>> [
>>>>>>>> 103.0, 3.0 ], [ 103.0, 2.0 ], [ 102.0, 2.0 ] ], [ [ 100.0, 0.0 ], [
>>>>>>>> 100.0, 1.0 ], [ 101.0, 1.0 ], [ 101.0, 0.0 ], [ 100.0, 0.0 ] ] ] ] }
>>>>>>>> }
>>>>>>>> ] }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I can replicate this with GDAL 2.0.1 and rgdal 1.0-7 - reading temp
>>>>>>> in
>>>>>>> gives
>>>>>>> the orphaned hole. I'm surprised that passing the js object to
>>>>>>> readOGR()
>>>>>>> doesn't fail, but that isn't the source of the problem. spdf appears
>>>>>>> to
>>>>>>> be
>>>>>>> structured correctly. Please provide your GDAL and rgdal versions,
>>>>>>> and
>>>>>>> OS
>>>>>>> details from sessionInfo().
>>>>>>>
>>>>>>> We are completely relying on the GDAL drivers here - we can't cherry
>>>>>>> pick
>>>>>>> for particular drivers (historically excepting ESRI Shapefile), so
>>>>>>> your
>>>>>>> debugging will need to examine writeOGR(), the C code it calls, and
>>>>>>> the
>>>>>>> interactions between the rgdal C code and called OGR functions.
>>>>>>> Please
>>>>>>> check
>>>>>>> which other drivers are affected (I think KML is, is GML?). Can you
>>>>>>> place
>>>>>>> debugging Rprintf(...); in the rgdal C code to see where this is
>>>>>>> coming
>>>>>>> from?
>>>>>>>
>>>>>>> I can start looking at this sometime next week, so please make a
>>>>>>> start
>>>>>>> yourself; contributions from others are very welcome.
>>>>>>>
>>>>>>> Roger
>>>>>>>
>>>>>>>>
>>>>>>>> If you look closely at the output, you can see that the
>>>>>>>> 'coordinates'
>>>>>>>> array now contains a single polygon array with two coordinate
>>>>>>>> arrays:
>>>>>>>> the boundary, and a second one which is now treated as a hole of the
>>>>>>>> first (orphaned as it is outside the bounds of the polygon). The
>>>>>>>> original 'coordinates' array consists of two polygon arrays, each
>>>>>>>> consisting of a single coordinate array which defining a polygon
>>>>>>>> (with
>>>>>>>> no holes), which is correct according the GeoJSON spec:
>>>>>>>> http://geojson.org/geojson-spec.html#polygon.
>>>>>>>>
>>>>>>>> I'm always hesitant to call things a bug, but this doesn't appear to
>>>>>>>> happen using ogr2ogr on the command line:
>>>>>>>>
>>>>>>>> writeOGR(spdf, ".", "test", driver = "ESRI Shapefile") # Write a
>>>>>>>> shapefile to convert using ogr2ogr
>>>>>>>> system("ogr2ogr -f GeoJSON test_from_shp.geojson test.shp")
>>>>>>>> cat(readLines("test_from_shp.geojson"))
>>>>>>>>
>>>>>>>> # Output:
>>>>>>>> { "type": "FeatureCollection",  "features": [ { "type": "Feature",
>>>>>>>> "properties": { "FID": 0 }, "geometry": { "type": "MultiPolygon",
>>>>>>>> "coordinates": [ [ [ [ 102.0, 2.0 ], [ 102.0, 3.0 ], [ 103.0, 3.0 ],
>>>>>>>> [
>>>>>>>> 103.0, 2.0 ], [ 102.0, 2.0 ] ] ], [ [ [ 100.0, 0.0 ], [ 100.0, 1.0
>>>>>>>> ],
>>>>>>>> [ 101.0, 1.0 ], [ 101.0, 0.0 ], [ 100.0, 0.0 ] ] ] ] } } ] }
>>>>>>>>
>>>>>>>> The resulting output is correct.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks in advance for any help on this.
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> R-sig-Geo mailing list
>>>>>>>> R-sig-Geo at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Roger Bivand
>>>>>>> Department of Economics, Norwegian School of Economics,
>>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>>> voice: +47 55 95 93 55; fax +47 55 95 91 00
>>>>>>> e-mail: Roger.Bivand at nhh.no
>>>>
>>>>
>>>>
>>>
>>> --
>>> Roger Bivand
>>> Department of Economics, Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; fax +47 55 95 91 00
>>> e-mail: Roger.Bivand at nhh.no
>>
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 91 00
> e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list