[R-sig-Geo] Cleaning small spatial polygons

Roger Bivand Roger.Bivand at nhh.no
Sat Nov 7 19:19:23 CET 2015


On Sat, 7 Nov 2015, Roger Bivand wrote:

> On Sat, 7 Nov 2015, Eduardo Diez wrote:
>
>>  Roger,
>>  Following this last approach strange things happen: the sliver polygons
>>  that are fully inside another one get "merged" with the larger; the ones
>>  that have no contact with the exterior but share boundaries with two or
>>  more polygons get deleted and a hole left behind; the ones in contact with
>>  the exterior get deleted.
>
> It is important to actually find out what GRASS v.clean intends to do. I 
> think that it is meant to remove precision and digitizing artefacts, really.

The manual says:

"tool=rmarea

The rmarea tool removes all areas <= thresh. The longest boundary with an 
adjacent area is removed or all boundaries if there is no adjacent area. 
Area categories are not combined when a small area is merged with a larger 
area."

So to replicate v.clean tool=rmarea, you'd need to find the proportions of 
boundary lengths with neighbouring polygons in order to choose the larger 
polygon with which to merge. This cannot be done in R anyway, as the 
vector model is not topological. GRASS uses a topological model, so can 
find the boundary lengths between neighbouring polygons. Note that the 
operation overwrites the attribute values of the smaller, merged polygon 
with those of the larger.

So a workflow that would emulate this in rasterToPolygons() might be first 
to detect all patches, and for patches of fewer than 15 cells, change 
their values (categorical) to those of the majority of neighbours. Trying 
a 3x3 filter would correspond to only 9 cells, so ~ 1800 m2, and would 
also dilate or erode boundaries between larger patches, so it would be 
quite tedious and would potentially affect your inferences. You could also 
do this in GRASS, possibly using r.clump, r.mfilter and the r.li.* suite 
to find the < threshold patches. Did you use raster::clump() to generate 
raster1?

Roger

>
>>  This is what i tried. As the polygons were projected i used values
>>  in squared meters for the set_RGEOS_polyThreshold function. I think values
>>  of around 1e-2 are more appropriate for Spatial* objects in GCS.
>> 
>
> No, the use of 1e-2 was for projected coordinates, and slivers were very, 
> very thin polygons (slivers) created by coordinate precision issues (with 
> areas o ~ 1e-5 m2).
>
> So the underlying question is why you want to remove non-sliver geometries 
> (that is geometries that do create visual clutter, but which are not caused 
> by coordinate precision issues). In fact, you are probably actually trying to 
> aggregate Polygons objects - those which have data attributes, with 
> neighbouring but larger polygons. You then need to choose (the algorithms 
> could, but you need to specify) which Polygons objects are to be aggregated 
> first, and then run gUnaryUnion() to merge the geometries, then aggregate the 
> attribute data, and finally remove any slivers that may be present.
>
> You also seem to have many geometries of 20.169 m2 in area, or multiples of 
> this, presumably created by rasterToPolygons(). They are likely isolated 
> single pixels of another value in raster 1. Could you use a focal method in 
> raster to "smooth" them out first? What are the statistical properties of the 
> "smoothed" patches - they will no longer be as homogeneous as they were 
> before - does this matter?
>
> Roger
>
>> >  pol1 <- rasterToPolygons(raster1, dissolve = T)
>> >  pol2 <- createSPComment(pol1)
>> >  set_RGEOS_polyThreshold(3000) # for example
>> >  set_RGEOS_warnSlivers(T)
>> >  set_RGEOS_dropSlivers(T)
>> >  t1 <- gBuffer(pol2, byid = T, width = 0)
>>  There were 50 or more warnings (use warnings() to see the first 50)
>> >  warnings()
>>  Warning messages:
>>  1: In gBuffer(pol2, byid = T, width = 0) : 1: Polygon object 0 area
>>  726.084
>>  2: In gBuffer(pol2, byid = T, width = 0) : 2: Polygon object 1 area 20.169
>>  3: In gBuffer(pol2, byid = T, width = 0) : 3: Polygon object 2 area 20.169
>>  4: In gBuffer(pol2, byid = T, width = 0) : 4: Polygon object 3 area 20.169
>>  5: In gBuffer(pol2, byid = T, width = 0) : 5: Polygon object 4 area 20.169
>>  6: In gBuffer(pol2, byid = T, width = 0) : 6: Polygon object 5 area 20.169
>>  7: In gBuffer(pol2, byid = T, width = 0) : 7: Polygon object 6 area 80.676
>>  8: In gBuffer(pol2, byid = T, width = 0) : 8: Polygon object 7 area 20.169
>>  9: In gBuffer(pol2, byid = T, width = 0) : 9: Polygon object 8 area 20.169
>>  10: In gBuffer(pol2, byid = T, width = 0) : 10: Polygon object 9 area
>>  20.169
>>  11: In gBuffer(pol2, byid = T, width = 0) : 11: Polygon object 10 area
>>  2581.63
>>  12: In gBuffer(pol2, byid = T, width = 0) : 12: Polygon object 11 area
>>  40.338
>>  13: In gBuffer(pol2, byid = T, width = 0) : 13: Polygon object 12 area
>>  20.169
>>  14: In gBuffer(pol2, byid = T, width = 0) : 14: Polygon object 14 area
>>  1432
>>  15: In gBuffer(pol2, byid = T, width = 0) : 15: Polygon object 15 area
>>  40.338
>>  16: In gBuffer(pol2, byid = T, width = 0) : 16: Polygon object 16 area
>>  20.169
>>  17: In gBuffer(pol2, byid = T, width = 0) : 17: Polygon object 17 area
>>  20.169
>>  18: In gBuffer(pol2, byid = T, width = 0) : 18: Polygon object 18 area
>>  20.169
>>  19: In gBuffer(pol2, byid = T, width = 0) : 19: Polygon object 19 area
>>  80.676
>>  20: In gBuffer(pol2, byid = T, width = 0) : 20: Polygon object 21 area
>>  20.169
>>  21: In gBuffer(pol2, byid = T, width = 0) : 21: Polygon object 22 area
>>  1230.31
>>  22: In gBuffer(pol2, byid = T, width = 0) : 22: Polygon object 23 area
>>  20.169
>>  23: In gBuffer(pol2, byid = T, width = 0) : 23: Polygon object 25 area
>>  20.169
>>  24: In gBuffer(pol2, byid = T, width = 0) : 24: Polygon object 26 area
>>  20.169
>>  25: In gBuffer(pol2, byid = T, width = 0) : 25: Polygon object 27 area
>>  564.732
>>  26: In gBuffer(pol2, byid = T, width = 0) : 26: Polygon object 28 area
>>  40.338
>>  27: In gBuffer(pol2, byid = T, width = 0) : 27: Polygon object 30 area
>>  504.225
>>  28: In gBuffer(pol2, byid = T, width = 0) : 28: Polygon object 31 area
>>  605.07
>>  29: In gBuffer(pol2, byid = T, width = 0) : 29: Polygon object 33 area
>>  20.169
>>  30: In gBuffer(pol2, byid = T, width = 0) : 30: Polygon object 34 area
>>  20.169
>>  31: In gBuffer(pol2, byid = T, width = 0) : 31: Polygon object 35 area
>>  1210.14
>>  32: In gBuffer(pol2, byid = T, width = 0) : 32: Polygon object 36 area
>>  20.169
>>  33: In gBuffer(pol2, byid = T, width = 0) : 33: Polygon object 37 area
>>  685.746
>>  34: In gBuffer(pol2, byid = T, width = 0) : 34: Polygon object 38 area
>>  40.338
>>  35: In gBuffer(pol2, byid = T, width = 0) : 35: Polygon object 39 area
>>  20.169
>>  36: In gBuffer(pol2, byid = T, width = 0) : 36: Polygon object 40 area
>>  20.169
>>  37: In gBuffer(pol2, byid = T, width = 0) : 37: Polygon object 41 area
>>  80.676
>>  38: In gBuffer(pol2, byid = T, width = 0) : 38: Polygon object 42 area
>>  20.169
>>  39: In gBuffer(pol2, byid = T, width = 0) : 39: Polygon object 43 area
>>  625.239
>>  40: In gBuffer(pol2, byid = T, width = 0) : 40: Polygon object 44 area
>>  20.169
>>  41: In gBuffer(pol2, byid = T, width = 0) : 41: Polygon object 46 area
>>  262.197
>>  42: In gBuffer(pol2, byid = T, width = 0) : 42: Polygon object 47 area
>>  2500.96
>>  43: In gBuffer(pol2, byid = T, width = 0) : 43: Polygon object 48 area
>>  20.169
>>  44: In gBuffer(pol2, byid = T, width = 0) : 44: Polygon object 49 area
>>  40.338
>>  45: In gBuffer(pol2, byid = T, width = 0) : 45: Polygon object 53 area
>>  80.676
>>  46: In gBuffer(pol2, byid = T, width = 0) : 46: Polygon object 56 area
>>  1290.82
>>  47: In gBuffer(pol2, byid = T, width = 0) : 47: Polygon object 57 area
>>  2722.82
>>  48: In gBuffer(pol2, byid = T, width = 0) : 48: Polygon object 59 area
>>  2218.59
>>  49: In gBuffer(pol2, byid = T, width = 0) : 49: Polygon object 60 area
>>  40.338
>>  50: In gBuffer(pol2, byid = T, width = 0) : 50: Polygon object 61 area
>>  121.014
>> 
>> >  spplot(t1)
>>
>>  Thanks, and sorry if i can't be of much help in making suggestions as i
>>  don't know how could this be done
>> 
>>
>>  2015-11-05 8:05 GMT-03:00 Roger Bivand <Roger.Bivand at nhh.no>:
>> 
>> >  On Wed, 4 Nov 2015, Roger Bivand wrote:
>> > 
>> >  On Wed, 4 Nov 2015, Eduardo Diez wrote:
>> > > 
>> > >   Is there any way of doing this or should i forget it and go on using
>> > > >  GRASS
>> > > >   through rgrass7?
>> > > > 
>> > > 
>> > >  I suggest working with Emmanuel Blondel (cleangeo maintainer) to 
>> > >  extend
>> > >  cleangeo (also suggested in an earlier thread today, with apologies to
>> > >  Emmanuel for picking on him!). That package already uses rgeos, and is 
>> > >  a
>> > >  logical place to put GRASS v.clean-like functionality (maybe even
>> > >  encapsulating using GRASs via rgrass7 and a throwaway location).
>> > > 
>> > >  At the moment it is messy, though some rgeos internals do do something
>> > >  like this, but it isn't exposed to users.
>> > > 
>> > 
>> >  A possibility is to use:
>> > 
>> >  set_RGEOS_polyThreshold(1e-2) # for example
>> >  set_RGEOS_warnSlivers(TRUE)
>> > 
>> >  shows the remaining slivers, and:
>> > 
>> >  set_RGEOS_dropSlivers(TRUE)
>> > 
>> >  drops them when using for example:
>> > 
>> >  t1 <- gBuffer(<your_object>, byid=TRUE, width=0)
>> > > 
>> > 
>> >  A buffer of zero width should not have side-effects, but your mileage 
>> >  may
>> >  vary. It will only remove slivers, not dangles. I haven't tried it when 
>> >  it
>> >  also finds a Polygons object under the threshold, which was your initial
>> >  problem.
>> > 
>> >  Roger
>> > 
>> > 
>> > 
>> > >  Roger
>> > > 
>> > > 
>> > > >   Thanks
>> > > > 
>> > > >   2015-10-19 15:03 GMT-03:00 Eduardo Diez <eduardodiez at gmx.com>:
>> > > > 
>> > > > >   Ok. So here's a link to a zip file that contains two shapefiles:
>> > > > >   - pol_to_be_cleaned: the layer from which i'd like to remove 
>> > > > >   small
>> > > > >   polygons
>> > > > >    - pol_cleaned: the layer cleaned with the function v.clean 
>> > > > >    rmarea
>> > > > > >   http://1drv.ms/1GmRWS7
>> > > > > >   The threshold i used for cleaning was 3000 (meaning 3000 
>> > > > > >   squared >
>> > > >  meters).
>> > > > > >   Although i do project it before sending it to GRASS, according 
>> > > > > >   to
>> > > >  the
>> > > > >   official help page it should be able to handle it:
>> > > > >   "Threshold must always be in square meters, also for
>> > > >  latitude-longitude
>> > > > >   locations or locations with units other than meters"
>> > > > > >   Thanks
>> > > > > >   2015-10-19 8:28 GMT-03:00 Roger Bivand <Roger.Bivand at nhh.no>:
>> > > > > > >   On Fri, 16 Oct 2015, Eduardo Diez wrote:
>> > > > > > > >   Dear list,
>> > > > > > >   I'm willing to know if any knows a way of performing tha same
>> > > > thing > > >   i'm
>> > > > > > >   doing through rgrass7 with GRASS when I execute the function
>> > > >  v.clean > > >  with
>> > > > > > >   "rmarea" as the tool argument. That is:
>> > > > > > > > > >   "The rmarea tool removes all areas <= thresh. The 
>> > > > > > > > > >   longest
>> > > > boundary > > >   with
>> > > > > > >   an
>> > > > > > >   adjacent area is removed or all boundaries if there is no
>> > > > adjacent > > >   area.
>> > > > > > >   Area categories are not combined when a small area is merged 
>> > > > > > >   with
>> > > >  a
>> > > > > > >   larger
>> > > > > > >   area."
>> > > > > > > > > >   Basically i have raster of zones within a field. I 
>> > > > > > > > > >   convert
>> > > >  it to
>> > > > > > >   SpatialPolygonsDataFrame and in order to leave only the more
>> > > > > > >   important/meaningful ones i remove the small/sliver with this
>> > > >  tool. > > >  In
>> > > > > > >   general it works fine but having to call an external software
>> > > >  with a
>> > > > > > >   specific version makes the script less portable and you have 
>> > > > > > >   to be
>> > > > > > >   careful
>> > > > > > >   with updates and such. Also you have to write rasters and
>> > > > shapefiles > > >   back
>> > > > > > >   and forth as GRASS can't work with in-memory objects.
>> > > > > > > > > > >   Could you please provide an example of a built-in or
>> > > >  contributed data > >  set
>> > > > > >   (URL, not attachment) with the slivers you mention, so that we 
>> > > > > >   know
>> > > > > >   that we
>> > > > > >   are addressing your problem? I don't think that:
>> > > > > > > >   https://cran.r-project.org/web/packages/cleangeo/index.html
>> > > > > > > >   does this, as it seems to try to repair broken geometries.
>> > > > > > > >   Also note that you need to specify that the area threshold 
>> > > > > > > >   is
>> > > >  in a > >  square
>> > > > > >   planar metric - dropping slivers in unprojected geometries may 
>> > > > > >   be
>> > > >  more
>> > > > > >   complicated.
>> > > > > > > >   Roger
>> > > > > > > > > > >   Does someone know a way of doing this in plain R?
>> > > > > > > > > >   Thanks
>> > > > > > > > > >           [[alternative HTML version deleted]]
>> > > > > > > > > >   _______________________________________________
>> > > > > > >   R-sig-Geo mailing list
>> > > > > > >   R-sig-Geo at r-project.org
>> > > > > > >   https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>> > > > > > > > > > > >   --
>> > > > > >   Roger Bivand
>> > > > > >   Department of Economics, Norwegian School of Economics,
>> > > > > >   Helleveien 30, N-5045 Bergen, Norway.
>> > > > > >   voice: +47 55 95 93 55; fax +47 55 95 91 00
>> > > > > >   e-mail: Roger.Bivand at nhh.no
>> > > > > > > > > 
>> > > > 
>> > > 
>> > > 
>> > > 
>> >  --
>> >  Roger Bivand
>> >  Department of Economics, Norwegian School of Economics,
>> >  Helleveien 30, N-5045 Bergen, Norway.
>> >  voice: +47 55 95 93 55; fax +47 55 95 91 00
>> >  e-mail: Roger.Bivand at nhh.no
>> > 
>> > 
>> 
>
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list