[R-sig-Geo] Cleaning up self-intersections

Lyndon Estes lestes at princeton.edu
Fri Dec 14 21:16:18 CET 2012


Dear List,

I am working on a project to have users maps agricultural fields
(mappingafrica.princeton.edu, if anyone is interested or has comments
on what we are trying to do (will be highly appreciated)), and it
makes uses of rgeos functions quite heavily to perform some quality
control checks on user maps versus our definition of "truth".

Given the type of work we are asking of people, we are getting many
"unclean" polygons (self-intersects, overlaps, non-noding
intersections, etc).  I have been writing in various patches to deal
with these as I have been going, but I turn now for some advice on
cleaning up operations. An example taken from the workflow illustrates
a typical problem I encounter that I am looking for an efficient way
to handle. This example is followed by my two questions.

#########################################################
# Begin example

library(rgdal)
library(rgeos)
library(raster)

# Load polygons.  q is the polygon representing "truth",
# u represents the user's efforts
load(url("http://dl.dropbox.com/u/31890709/test_polys.Rdata"))

# My workflow has several attempts to clean up the code after
# retrieving  some important information on the number of crop
# fields perceived by the user
user.nfields <- nrow(u)  # 13 polygons
qa.nfields <- nrow(q)  # 9 fields

# Once that is dispensed with, I am then interested in the areas of
# overlaps between the SpatialPolygons* u and q, so no longer care
# how many individual fields there are in the dataset. Before I can do
# that, I first have to do some cleaning of messy polygons.
# By messy, I mean they have many overlaps
qo <- gOverlaps(q, byid = T)
uo <- gOverlaps(u, byid = T)
qo[which(qo)] <- 1
uo[which(uo)] <- 1
rowSums(qo)
# 0 1 2 3 4 5 6 7 8
# 1 0 0 0 2 2 3 1 1
rowSums(uo)
# 0  1  2  3  4  5  6  7  8  9 10 11 12
# 3  1  1  1  3  1  1  4  2  1  1  2  1

# Overlaps between many, which is expected with manual digitization.
# To prevent failures running gIntersect
# and gDifference at later steps in the workflow, I did this

qu <- gUnaryUnion(q)  # Works Fine
uu <- gUnaryUnion(u)  # Fails!
# In fact, if you use Rstudio (as I do on my Mac), you will
# have the entire program crash

# R from terminal gives this:
# Error: TopologyException: found non-noded intersection between
# LINESTRING
# (643687 -3.20465e+06, 643686 -3.20465e+06) and LINESTRING
# (643687 -3.20465e+06, 643660 -3.20465e+06) at
# 643686.30952548271 -3204653.8822991014
# (I haven't figured out why this causes an Rstudio crash, but that is not
# the concern here)
# Looking more closely at which polygons are offending

gIsValid(u, byid = T)

# 0     1     2     3     4     5     6     7     8     9    10    11    12
# TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
# TRUE  TRUE  TRUE FALSE
# Warning messages:
# 1: In RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid") :
# Self-intersection at or near point 643686.30952548271
# -3204653.8822991014
#2: In RGEOSUnaryPredFunc(spgeom, byid, "rgeos_isvalid") :
# Self-intersection at or near point 642553.28272326116
# -3203984.9120338778

# Looking a plot, the self intersections are not visible,
# but can be fixed with two possible approaches
plot(u[8, ])
u8 <- gSimplify(u[8, ], tol = 0.0001)  # Approach 1
gIsValid(u8)  # TRUE
plot(u8)  # Look identical to original
u8 <- gBuffer(u[8, ], width = 0)  # Approach 2
gIsValid(u8)  # TRUE
plot(u8)  # Looks identical

# Either of those approaches works nicely. However, the second
# offender, which I made to represent an extreme
# case of self-intersection, loses a lot of its area when cleaned with
# these methods:
u13 <- gSimplify(u[13, ], tol = 0.0001)
u13a <- gBuffer(u[13, ], width = 0)
plot(u[13, ])
plot(u13, add = T, col = rgb(1, 0, 0, alpha = 0.7))  # Not identical
plot(u13a, add = T, col = rgb(0, 0, 1, alpha = 0.7))  # Not identical
# You can see from the plots that a large part of the polygon area is lost.
# Ideally I would like to retain the full area, but I have not worked out
# how to do this efficiently
# One solution, involving the raster package
polr <- raster(extent(u[13, ]))
res(polr) <- 10  # Coarser resolution than I would like, for speed
u13r <- rasterize(u[13, ], polr)  # Rasterize
u13f <- rasterToPolygons(u13r, n = 4, dissolve = T)  # Polygonize
gIsValid(u13f)  # TRUE
plot(u13f) # Approximates original polygon

# End example
#####################################

So, here''s my first question.  Is there some other of cleaning
self-intersections up that preserves the area better than the examples
using gBuffer and gSimplify, and which might be more efficient and
provide a results that more closely approximates the original map than
rasterizing and polygonizing? e.g. Is it possible to explode polygons
at the point(s) of self-intersection, in order to create multiple
valid polygons?

My second question: am I barking up the wrong tree here?  Should I
preferentially use something like GRASS's v.clean before I even read
into R?  I haven't until now because 1) I don't know GRASS very well,
2) I want to minimize the number of different routines/software in the
workflow (which currently involves python, postgis/postgres, R,
openlayers).

Will very much appreciate any pointers/comments on this.

Thanks, Lyndon  (sessionInfo() for this example follows)

>From testing on Mac:

R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] C/en_US.UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] raster_2.0-31 rgeos_0.2-9   rgdal_0.7-22  sp_1.0-2

loaded via a namespace (and not attached):
[1] grid_2.15.1    lattice_0.20-6 tools_2.15.1

For linux server, where the work actually gets done:

R version 2.15.1 (2012-06-22)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=C                 LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rgdal_0.7-18  raster_2.0-12 rgeos_0.2-7   plyr_1.7.1    stringr_0.6.1
[6] sp_0.9-99

loaded via a namespace (and not attached):
[1] grid_2.15.1    lattice_0.20-6



lestes at princeton.edu
www.princeton.edu/~lestes



More information about the R-sig-Geo mailing list