[R-sig-Geo] Question for cleaning shapefiles for dissertation
Rainer Krug
R@|ner @end|ng |rom krug@@de
Thu Mar 18 14:41:19 CET 2021
Hi Sean,
Shape files are notoriously bad, and you just found one reason why.
I would suggest to look int e.g. GRASS (https://grass.osgeo.org <https://grass.osgeo.org/>) or QGIS (https://www.qgis.org/en/site/) for doing the cleaning and joining. Afterwards, you can go back to R to do further analysis.
Cheers,
Rainer
> On 18 Mar 2021, at 14:02, Sean Trende <strende using realclearpolitics.com> wrote:
>
> Hello,
>
> First, thank you for allowing me to join this group. To be able to (quietly) observe discussions from so many great minds in an area so important for my day-to-day job is something I am greatly looking forward to.
>
> I do have a question that I hope someone can help with. For my dissertation, I am building an areal model off of Ohio precincts. The shapefile (https://github.com/mggg/ohio-precincts) is notoriously filled with slivers, which complicates the adjacency matrix.
>
> I've come up with a wonky solution (code at bottom) where, after loading the shapefile and filtering by county I manually increase the size of the snap by .01 and look to see whether the number of nonzero links increases. If it does, I can then plot the new and old adjacency matrices, and eyeball the new link(s) to determine whether it is "real" or if the snap has grown so large I am linking precincts that are not actually adjacent. I am hoping I can find a snap level that links most of the truly adjacent precincts, without linking too many precincts that are not actually adjacent. I've gotten through ten counties, and it looking like the best snap is ~30.
>
> This is obviously time-consuming, and in cities it can be tough to see whether precincts are actually adjacent given their small size. My questions are:
>
> 1) Is there a way to extract the number nonzero links from an 'nb' class of files and assign it to a variable? This would allow me to automate the inquiry and examine only those snaps that increased the number of links, saving time.
> 2) Am I missing a much more straightforward solution to cleaning the shapefile entirely?
>
> Thank you for your time,
>
> Sean
>
> df <- readOGR("OH_precincts.shp")
> vect <- as.character(unique(df$COUNTY))
> j <- 1 #####Increase this as you want to do successive counties
>
> selected_county <- vect[j]
> df_county <- subset(df, COUNTY == selected_county)
>
> ia <- 0
> ib <- 0.01 #####First compare no snap with a snap of 0.01
>
> nb_a <- poly2nb(df_county, snap = ia)
> nb_b <- poly2nb(df_county, snap = ib)
> coords <- coordinates(df_county)
>
> nb_a
> nb_b #####This allows you to look and see if there are more links with the larger snap than with the smaller snap
>
> plot(df_county)
> plot(nb_b, coords, col = "blue", add = T, lwd = 2)
> plot(nb_a, coords, col = "grey", add = T, lwd = 2) #####Because of the overlay, the new connection will be blue
>
> ia <- ia + .01
> ib <- ib + .01 ##### increases the size of the snap by 0.01, and increases the comparison by a similar amount
> #####so if this is the first time you run it, your next comparison will be snaps of 0.01 and 0.02
> #####To keep it from resetting I just highlight up to the assignment of nb_a, and then hit cntrl_enter until the number of links increases
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20210318/b8c2b50e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4372 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20210318/b8c2b50e/attachment.p7s>
More information about the R-sig-Geo
mailing list