[R-sig-Geo] Question for cleaning shapefiles for dissertation

Thu Mar 18 14:02:49 CET 2021

Hello,

First, thank you for allowing me to join this group. To be able to (quietly) observe discussions from so many great minds in an area so important for my day-to-day job is something I am greatly looking forward to.

I do have a question that I hope someone can help with.  For my dissertation, I am building an areal model off of Ohio precincts. The shapefile (https://github.com/mggg/ohio-precincts) is notoriously filled with slivers, which complicates the adjacency matrix.

I've come up with a wonky solution (code at bottom) where, after loading the shapefile and filtering by county I manually increase the size of the snap by .01 and look to see whether the number of nonzero links increases.  If it does, I can then plot the new and old adjacency matrices, and eyeball the new link(s) to determine whether it is "real" or if the snap has grown so large I am linking precincts that are not actually adjacent.  I am hoping I can find a snap level that links most of the truly adjacent precincts, without linking too many precincts that are not actually adjacent.  I've gotten through ten counties, and it looking like the best snap is ~30.

This is obviously time-consuming, and in cities it can be tough to see whether precincts are actually adjacent given their small size.  My questions are:

1) Is there a way to extract the number nonzero links from an 'nb' class of files and assign it to a variable? This would allow me to automate the inquiry and examine only those snaps that increased the number of links, saving time.
2) Am I missing a much more straightforward solution to cleaning the shapefile entirely?

Thank you for your time,

Sean

df <- readOGR("OH_precincts.shp")
vect <- as.character(unique(df$COUNTY))
j <- 1 #####Increase this as you want to do successive counties

selected_county <- vect[j]
df_county <- subset(df, COUNTY == selected_county)

ia <- 0
ib <- 0.01 #####First compare no snap with a snap of 0.01

nb_a <- poly2nb(df_county, snap = ia)
nb_b <- poly2nb(df_county, snap = ib)
coords <- coordinates(df_county)

nb_a
nb_b #####This allows you to look and see if there are more links with the larger snap than with the smaller snap

plot(df_county)
plot(nb_b, coords, col = "blue", add = T, lwd = 2)
plot(nb_a, coords, col = "grey", add = T, lwd = 2) #####Because of the overlay, the new connection will be blue

ia <- ia + .01
ib <- ib + .01 ##### increases the size of the snap by 0.01, and increases the comparison by a similar amount
#####so if this is the first time you run it, your next comparison will be snaps of 0.01 and 0.02
#####To keep it from resetting I just highlight up to the assignment of nb_a, and then hit cntrl_enter until the number of links increases