[R-sig-Geo] Exporting Multipolygon geojson with rgdal::writeOGR

Roger Bivand Roger.Bivand at nhh.no
Mon Nov 2 22:21:56 CET 2015


Hi Andy,

On Mon, 26 Oct 2015, Andy Teucher wrote:

> Hi Roger,
>
> Sorry I haven't been much help here - it's not for lack of trying,
> just lack of skills. I've been poring through OGR_write.cpp and trying
> to figure out language and the flow of things... a few thoughts
> occurred to me (which are probably obvious to you):
>

Your example was very helpful, the problem was that when writeOGR() and 
the underlying compiled code was written, the model used was that of the 
ESRI Shapefile driver, rather than OGC SFS, which was first published in 
2004. Most older formats (and their drivers in OGR) use a OGR geometry 
factory function to organize polygons on reading - but this is not present 
in shapelib (bundled with the R maptools package), so when designing 
polygon classes for sp, and writing rgdal::writeOGR() about 10 years ago, 
this was not known, and was not revisited subsequently, until you happily 
spotted the bug.

GEOS in rgeos uses OGC SFS definitions (the ones you are using below), but 
the SpatialPolygons/Polygons/Polygon structure was already established by 
then, and a fix (using comment attributes in Polygons objects) was used, 
rather than a major change to Polygons objects (in 2010).

The revision to rgdal::writeOGR() now branches on the input objects - if 
they have comments (applied in code, not user-facing), these are trusted, 
and the MultiPolygon objects constructed containing one or more Polygon 
objects with one exterior ring and zero or more interior rings on that 
basis. However, if the input objects do not have comment attributes, 
OGRGeometryFactory::organizePolygons() (C++) is used to create properly 
organized MultiPolygon objects for export.

This means that the pre-OGC SFS drivers continue to work, but that the OGC 
SFS drivers, like GeoJSON, now also work correctly when the second ring in 
an object is not assumed to be an interior ring (rgdal < 1.1-1). A test 
set for all the possible cases has been introduced.

The revision is now in source package form on CRAN:

https://cran.r-project.org/web/packages/rgdal/index.html

and a Windows binary package will be available later on Tuesday I hope. 
The OSX revision may take a little longer, but we are grateful to have 
what we have anyway.

Thanks for a clear and very helpful bug report.

Best wishes,

Roger

> 1) Is this bug because holes are standalone polygons in sp objects,
> while they are members of polygon objects in geojson, wkt, etc?
>
> 2) On a related note, I notice that a single polygon with a hole is
> identified as a multipolygon containing a single polygon with a hole,
> when using writeOGR and one of the affected drivers.  Practically I
> don't think this matters too much, but it's not quite right.
>
> 3) In lines 116-160 the polygon/multipolygon and line/multiline
> determination is made for the entire layer. Can/should this be made
> for each feature in the layer, as you could have a layer with a
> mixture of single polygons and multipolygons?
>
> 4) I feel like there ought to be one more layer of looping through the
> polyons. e.g.:
>
> loop over each feature;
>     determine whether each feature is a single polygon (length 1 or
> all rings but one are holes), or a multipolygon:
>        if a polygon loop over and add ring(s);
>        if a multipolygon loop over polygons:
>            for each polygon add ring(s) and somehow assign holes to
> parent polygons
>
> Again, apologies if I'm being totally obvious (or way off base).
> Thanks for all your work on these packages, and on this issue.
>
> Andy
>
>
> On Thu, Oct 22, 2015 at 11:07 PM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
>> On Fri, 23 Oct 2015, Andy Teucher wrote:
>>
>>> On my Mac, I am running rgdal 1.0-7 and GDAL 1.11.3
>>>
>>>
>>> sessionInfo()
>>>
>>
>> OK, thanks.
>>
>> The problem also affects the SQLite driver (the CRAN binaries do not have
>> this driver). Could someone please check the PostGIS driver - my guess is
>> that all OGC SFS compliant drivers may be affected.
>>
>> Could somebody also please check whether this is a recently introduced issue
>> or not? For example somebody still on rgdal < 1.0? src/OGR_write.cpp was
>> changed 2015-08-21, but the last previous change was 2015-06-11, then
>> 2015-05-31, 2014-08-17 ... from the ChangeLog visible on the package CRAN
>> page.
>>
>> Roger
>>
>>
>>>
>>>
>>>
>>>
>>> R version 3.2.2 (2015-08-14)
>>>
>>> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>>
>>> Running under: OS X 10.11 (El Capitan)
>>>
>>>
>>>
>>>
>>> locale:
>>>
>>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>>
>>>
>>>
>>>
>>> attached base packages:
>>>
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>>
>>>
>>>
>>> other attached packages:
>>>
>>> [1] rgdal_1.0-7 sp_1.2-1
>>>
>>>
>>>
>>>
>>> loaded via a namespace (and not attached):
>>>
>>> [1] tools_3.2.2     grid_3.2.2      lattice_0.20-33
>>>
>>>
>>>
>>>
>>> Andy
>>>
>>> On Thu, Oct 22, 2015 at 11:28 AM, Andy Teucher <andy.teucher at gmail.com>
>>> wrote:
>>>
>>>> One more test: WKT has the same issue:
>>>> library(rgdal)
>>>> js <- '{
>>>> "type": "MultiPolygon",
>>>> "coordinates": [[[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0],
>>>> [102.0, 2.0]]], [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0],
>>>> [100.0, 0.0]]]]
>>>> } '
>>>> spdf <- readOGR(js, layer='OGRGeoJSON', verbose=FALSE)
>>>> writeOGR(spdf, dsn = "test.csv", layer = "test", driver="CSV",
>>>> layer_options = "GEOMETRY=AS_WKT")
>>>> cat(readLines("test.csv"), sep = "\n")
>>>> WKT,FID,
>>>> "MULTIPOLYGON (((102 2,102 3,103 3,103 2,102 2),(100 0,100 1,101 1,101
>>>> 0,100 0)))",0
>>>> It should be:
>>>> WKT,FID,
>>>> "MULTIPOLYGON (((102 2,102 3,103 3,103 2,102 2)),((100 0,100 1,101
>>>> 1,101 0,100 0)))",0
>>>> Andy
>>>> On Thu, Oct 22, 2015 at 10:36 AM, Andy Teucher <andy.teucher at gmail.com>
>>>> wrote:
>>>>>
>>>>> Thanks very much Roger.
>>>>>
>>>>> I've replicated this on my work machine (Windows) and on my Mac at home.
>>>>>
>>>>> KML and GML are both affected. ESRI shapefile works fine and, though
>>>>> I'm not very familiar with the format, 'MapInfo File' works too
>>>>> (loading the file in qgis shows a multipart polygon with no geometry
>>>>> errors).
>>>>>
>>>>> sessionInfo() for my Windows machine (using rgdal 1.0-7 with included
>>>>> gdal drivers):
>>>>>
>>>>> R version 3.2.2 (2015-08-14)
>>>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>>> Running under: Windows 7 x64 (build 7601) Service Pack 1
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252
>>>>> LC_MONETARY=English_Canada.1252
>>>>> [4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>
>>>>> other attached packages:
>>>>> [1] rgdal_1.0-7 sp_1.2-0
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] tools_3.2.2     grid_3.2.2      lattice_0.20-33
>>>>>
>>>>>
>>>>> I'll send the details for my Mac environment this evening.
>>>>>
>>>>> I'll dig into the source code as soon as I can - C is a completely
>>>>> foreign language to me, so I can't promise anything fast!
>>>>>
>>>>> Thanks again,
>>>>> Andy
>>>>>
>>>>> On Thu, Oct 22, 2015 at 12:27 AM, Roger Bivand <Roger.Bivand at nhh.no>
>>>>> wrote:
>>>>>>
>>>>>> On Thu, 22 Oct 2015, Andy Teucher wrote:
>>>>>>
>>>>>>> I’m finding that writeOGR isn’t exporting multipolygons properly using
>>>>>>> the GeoJSON driver. I have a simple test case (borrowed from here:
>>>>>>>
>>>>>>>
>>>>>>> http://gis.stackexchange.com/questions/137977/writeogr-alters-multipolygon-holes)
>>>>>>> with a geojson string with one multipolygon containing two polygons. I
>>>>>>> use readOGR to create a SpatialPolygonsDataFrame out of it, then write
>>>>>>> it with writeOGR:
>>>>>>>
>>>>>>> library(rgdal)
>>>>>>>
>>>>>>> js <- '{
>>>>>>> "type": "MultiPolygon",
>>>>>>> "coordinates": [[[[102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0,
>>>>>>> 3.0],
>>>>>>> [102.0, 2.0]]], [[[100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0,
>>>>>>> 1.0],
>>>>>>> [100.0, 0.0]]]]
>>>>>>> } '
>>>>>>>
>>>>>>> spdf <- readOGR(js, layer='OGRGeoJSON', verbose=FALSE) # Create a
>>>>>>> SpatialPoygonsDataFrame
>>>>>>>
>>>>>>> temp <- tempfile()
>>>>>>> writeOGR(spdf, dsn = temp, layer = "", driver="GeoJSON")
>>>>>>> cat(readLines(temp))
>>>>>>>
>>>>>>> # Output:
>>>>>>> { "type": "FeatureCollection", "crs": { "type": "name", "properties":
>>>>>>> { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } }, "features": [ { "type":
>>>>>>> "Feature", "id": 0, "properties": { "FID": 0 }, "geometry": { "type":
>>>>>>> "MultiPolygon", "coordinates": [ [ [ [ 102.0, 2.0 ], [ 102.0, 3.0 ], [
>>>>>>> 103.0, 3.0 ], [ 103.0, 2.0 ], [ 102.0, 2.0 ] ], [ [ 100.0, 0.0 ], [
>>>>>>> 100.0, 1.0 ], [ 101.0, 1.0 ], [ 101.0, 0.0 ], [ 100.0, 0.0 ] ] ] ] } }
>>>>>>> ] }
>>>>>>
>>>>>>
>>>>>>
>>>>>> I can replicate this with GDAL 2.0.1 and rgdal 1.0-7 - reading temp in
>>>>>> gives
>>>>>> the orphaned hole. I'm surprised that passing the js object to
>>>>>> readOGR()
>>>>>> doesn't fail, but that isn't the source of the problem. spdf appears to
>>>>>> be
>>>>>> structured correctly. Please provide your GDAL and rgdal versions, and
>>>>>> OS
>>>>>> details from sessionInfo().
>>>>>>
>>>>>> We are completely relying on the GDAL drivers here - we can't cherry
>>>>>> pick
>>>>>> for particular drivers (historically excepting ESRI Shapefile), so your
>>>>>> debugging will need to examine writeOGR(), the C code it calls, and the
>>>>>> interactions between the rgdal C code and called OGR functions. Please
>>>>>> check
>>>>>> which other drivers are affected (I think KML is, is GML?). Can you
>>>>>> place
>>>>>> debugging Rprintf(...); in the rgdal C code to see where this is coming
>>>>>> from?
>>>>>>
>>>>>> I can start looking at this sometime next week, so please make a start
>>>>>> yourself; contributions from others are very welcome.
>>>>>>
>>>>>> Roger
>>>>>>
>>>>>>>
>>>>>>> If you look closely at the output, you can see that the 'coordinates'
>>>>>>> array now contains a single polygon array with two coordinate arrays:
>>>>>>> the boundary, and a second one which is now treated as a hole of the
>>>>>>> first (orphaned as it is outside the bounds of the polygon). The
>>>>>>> original 'coordinates' array consists of two polygon arrays, each
>>>>>>> consisting of a single coordinate array which defining a polygon (with
>>>>>>> no holes), which is correct according the GeoJSON spec:
>>>>>>> http://geojson.org/geojson-spec.html#polygon.
>>>>>>>
>>>>>>> I'm always hesitant to call things a bug, but this doesn't appear to
>>>>>>> happen using ogr2ogr on the command line:
>>>>>>>
>>>>>>> writeOGR(spdf, ".", "test", driver = "ESRI Shapefile") # Write a
>>>>>>> shapefile to convert using ogr2ogr
>>>>>>> system("ogr2ogr -f GeoJSON test_from_shp.geojson test.shp")
>>>>>>> cat(readLines("test_from_shp.geojson"))
>>>>>>>
>>>>>>> # Output:
>>>>>>> { "type": "FeatureCollection",  "features": [ { "type": "Feature",
>>>>>>> "properties": { "FID": 0 }, "geometry": { "type": "MultiPolygon",
>>>>>>> "coordinates": [ [ [ [ 102.0, 2.0 ], [ 102.0, 3.0 ], [ 103.0, 3.0 ], [
>>>>>>> 103.0, 2.0 ], [ 102.0, 2.0 ] ] ], [ [ [ 100.0, 0.0 ], [ 100.0, 1.0 ],
>>>>>>> [ 101.0, 1.0 ], [ 101.0, 0.0 ], [ 100.0, 0.0 ] ] ] ] } } ] }
>>>>>>>
>>>>>>> The resulting output is correct.
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance for any help on this.
>>>>>>>
>>>>>>> Andy
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> R-sig-Geo mailing list
>>>>>>> R-sig-Geo at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Roger Bivand
>>>>>> Department of Economics, Norwegian School of Economics,
>>>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>>>> voice: +47 55 95 93 55; fax +47 55 95 91 00
>>>>>> e-mail: Roger.Bivand at nhh.no
>>>
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; fax +47 55 95 91 00
>> e-mail: Roger.Bivand at nhh.no
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no


More information about the R-sig-Geo mailing list