[R-sig-Geo] Cleaning up self-intersections

Fri Jan 25 18:15:12 CET 2013

Hi,

I am back with the polygon cleaning issue after having looked into it
some more.  We are working on getting pprepair up and running, but I
have failed thus far to get it running and tested because it has a
large number of dependencies that don't seem to all agree with my
Macbook (Lion), so we are putting it on our linux server and will try
it there.

In the meantime, I have come up with a Grass-based solution.  I also
found that postgis's st_MakeValid seems to clean up things alright and
lead to valid polygons upon importing from my postgis database, but
there are problems that persist in downstream operations, which I
suppose has to do with noding, but I am not sure.  I am interested in
cleaning up polygon self-intersections, but I also need to get rid of
polygon overlaps so that I can do map comparisons, which entails a
number of intersects and differences between separate SpatialPolygons
objects.

Here is the solution I have come up with that seems to work. The code
seems to work for the situations I have found so far, so my main
question is: can I do the cleaning and overlap removal more
efficiently than this (assuming I can't get pprepair working)?

Immediately below the first example is a modified version of the first
that illustrate the type of lingering problem that can result even
with valid polygons, passed out of postgis with st_MakeValid applied
first (code for that not illustrated here because I don't have a way
of giving ready access to a postgis database).

##############################################################################################

library(rgdal)
library(rgeos)
library(spgrass6)

load(url("http://dl.dropbox.com/u/31890709/test_polys.Rdata"))
# u and q are the objects of interest
# u is invalid on two of its polygons, q is all valid, both have
overlaps in their polygons
# Since the code is lengthy, to avoid repetition, reassign a new name
to the object to test
pol <- q
gIsValid(pol, byid = T)  # Two invalid polygons
gOverlaps(pol, byid = T)  # Plenty of overlaps

# Step 1, GRASS cleaning
SG <- Sobj_SpatialGrid(pol)$SG
loc <- initGRASS("/Applications/GRASS-6.4.app/Contents/MacOS/",
home=tempdir(), SG, override = TRUE)
tf <- tempfile()
mapset <- execGRASS("g.gisenv", parameters=list(get="MAPSET"), intern=TRUE)
execGRASS("g.gisenv", parameters=list(set=shQuote('MAPSET=PERMANENT')))
prj <- showWKT(proj4string(pol), tf)
execGRASS("g.proj", flags="c", parameters=list(wkt=tf))
execGRASS("g.proj", flags="p")
execGRASS("g.gisenv", parameters=list(set=paste("'MAPSET=", mapset,
"'", sep="")))
execGRASS("g.region", flags="d")
writeVECT6(pol, "pol")
o <- readVECT6("pol", with_c=FALSE, remove.duplicates = F)
polylist <- lapply(1:length(unique(o at data$cat)), function(x)
gUnaryUnion(o[o at data$cat == x, ]))
newu <- SpatialPolygons(lapply(1:length(unique(o at data$cat)), function(x) {
 Polygons(polylist[[x]]@polygons[[1]]@Polygons, x)
}))  # Union each polygons and its little overlapping bits to make valid polygon
gIsValid(newu, byid = T)  # All valid

# Step 2, remove overlaps between polygons
outpoly <- newu
newpoly2 <- lapply(1:length(outpoly), function(j) outpoly[j, ])
itab <- gOverlaps(outpoly, byid = T)
itab[which(itab)] <- 1
itab1 <- rowSums(itab)
i <- which(itab1 > 0)[1]

while(i <= length(outpoly)) {
  print(unname(i))
  newpoly2[[i]] <- gDifference(spgeom1 = outpoly[i, ], spgeom2 =
gUnaryUnion(outpoly[-i, ]), byid = T)
  outpoly <- SpatialPolygons(lapply(1:length(newpoly2), function(x) {
    Polygons(newpoly2[[x]]@polygons[[1]]@Polygons, x)
  }))
  itab <- gOverlaps(outpoly, byid = T)
  itab[which(itab)] <- 1
  itab1 <- rowSums(itab)
  ct <- which(itab1 > 0)[1]
  i <- ifelse(is.na(ct), length(outpoly) + 1, ct)
}
gOverlaps(outpoly, byid = T)  # No overlaps
rm(list = ls())

###############################################################################################

Now, here is how a valid polygons will fail.  q is a valid polygon
before going through the Grass clean up process. However, if you skip
Step 1, and jump straight to Step 2, you will get caught in an endless
loop (modifed code below salutation).  Any thoughts as to why this is?

Any recommendations for how to improve this very clunky process will
be highly appreciated.

Thanks, Lyndon

load(url("http://dl.dropbox.com/u/31890709/test_polys.Rdata"))
# u and q are the objects of interest
# u is invalid on two of its polygons, q is all valid, both have
overlaps in their polygons
# Since the code is lengthy, to avoid repetition, reassign a new name
to the object to test
pol <- q
gIsValid(pol, byid = T)  # Two invalid polygons
gOverlaps(pol, byid = T)  # Plenty of overlaps

# Step 1 removed

# Step 2, remove overlaps between polygons
outpoly <- pol
newpoly2 <- lapply(1:length(outpoly), function(j) outpoly[j, ])
itab <- gOverlaps(outpoly, byid = T)
itab[which(itab)] <- 1
itab1 <- rowSums(itab)
i <- which(itab1 > 0)[1]

while(i <= length(outpoly)) {
 print(unname(i))
 newpoly2[[i]] <- gDifference(spgeom1 = outpoly[i, ], spgeom2 =
gUnaryUnion(outpoly[-i, ]), byid = T)
 outpoly <- SpatialPolygons(lapply(1:length(newpoly2), function(x) {
  Polygons(newpoly2[[x]]@polygons[[1]]@Polygons, x)
 }))
 itab <- gOverlaps(outpoly, byid = T)
 itab[which(itab)] <- 1
 itab1 <- rowSums(itab)
 ct <- which(itab1 > 0)[1]
 i <- ifelse(is.na(ct), length(outpoly) + 1, ct)
}
gOverlaps(outpoly, byid = T)  # No overlaps
rm(list = ls())