[R-sig-Geo] Finding the county shapefile polygon closest to along-lat position
Torleif Markussen Lunde
torleif.lunde at cih.uib.no
Fri May 22 13:06:51 CEST 2009
Hi Joe
Like Roger says you should first look at the possible datum problem. The
example below shows one method to find the closest polygon, and to add that
to the overlay. Notice however that if you change the projection and/or
ellipsoid the closest points (end of lines) are different. Closest line in
this case is calculated by:
abs((x2-x1)*(y1-y0)-(x1-x0)*(y2-y1))/sqrt(((x2-x1)^2+(y2-y1)^2))^2)
where x0, and y0 is the point, while x1, y1, and x2, y2 are the line ends.
To find the two closest points, spDistsN1 is used.
Try to play with un-commenting polys <- spTransform(polys, CRS("+proj=moll
+ellps=WGS84"))
and changing ll.logical=FALSE/TRUE (FALSE = Euclidean, TRUE = Great Circle
distance)
One of the things you will see is that the closest points are different when
you calculate the distance in km (ll.logical=TRUE) combined with
proj4string=CRS("+proj=moll +ellps=WGS84")
I hope this gives some insight in the problem.
The code can also be downloaded from http://open.uib.no/R/mindist.R
I do not guarantee the code is bug free.
You could try to replace polys with your polygon, and ssample with your
points, and see if it makes sense.
Best wishes
Torleif
#################
library(rgdal)
#Make data
set.seed(7)
grd <- GridTopology(c(1,1), c(1,1), c(2,2))
polys <- as.SpatialPolygons.GridTopology(grd)
proj4string(polys) <- CRS(" +proj=longlat +ellps=WGS84" )
## Try uncommenting this line, and see what are the closest points in the last
figure.
## In this case the same closest polygons are found, but not the same two
closest points
#polys <- spTransform(polys, CRS("+proj=moll +ellps=sphere"))
ssample <- spsample(polys,n=10,type="random")
#displace points
slot(ssample, 'coords') <- slot(ssample, 'coords')-mean(
slot(ssample, 'coords'))*.25
#overlay
overl <- overlay(polys, ssample)
#Show overlay
overl
#Which points are outside?
wh <- which(is.na(overl))
#Make plots
plot(polys, xlim=(c(bbox(polys)[1,])-c(1, -1)),
ylim=(c(bbox(polys)[2,])-c(1, -1)))
for (i in 1:4) points(slot(slot((slot(polys, 'polygons'))[[i]], 'Polygons')
[[1]], 'coords')[,1], slot(slot((slot(polys, 'polygons'))[[i]], 'Polygons')
[[1]], 'coords')[,2], col = i)
points(ssample)
for (i in 1:length(wh)) {
points(slot(ssample, 'coords')[wh[i],1], slot(ssample, 'coords')[wh[i],2],
col=(i+1))
}
# Number of polygons to sapply over (or number of polygons).
# With more polygons you could also try to only scan the lines around
# the "4" polygons which center is closest to the outlying point(?)
nit <- dim(coordinates(polys))[1]
#Find the closest polygon, and paste the data in overl
overl.na <- sapply(1:length(wh), function (x) {
x0 <- slot(ssample, 'coords')[wh[x],1]
y0 <- slot(ssample, 'coords')[wh[x],2]
mn <- sapply(1:nit, function(y) {
cc <- slot(slot((slot(polys, 'polygons'))[[y]], 'Polygons')
[[1]], 'coords')
x1 <- cc[1:(dim(cc)[1]-1),1]
x2 <- cc[2:dim(cc)[1],1]
y1 <- cc[1:(dim(cc)[1]-1),2]
y2 <- cc[2:dim(cc)[1],2]
#calculate distance
mn <-
mean(abs((x2-x1)*(y1-y0)-(x1-x0)*(y2-y1))/sqrt(((x2-x1)^2+(y2-y1)^2)))
return(mn)
})
wm <- which.min(mn)
return(list(wm, x0, y0))
})
##Can leave out. This section is just for plotting lines
#lines. Shortest distance
lin <- sapply(2:(length(wh)+1), function (x) {
p1 <- matrix(unlist(overl.na[-1,(x-1)]),,2)
pol.pos <- unlist(overl.na[1,])
pol1 <- slot(slot((slot(polys, 'polygons'))[[pol.pos[x-1]]], 'Polygons')
[[1]], 'coords')[-1,]
dst <- spDistsN1(pol1, p1, longlat=FALSE)
wh.dst <- which.min(dst)
ln <- Line(matrix( c(p1[,1], pol1[wh.dst, 1], p1[,2], pol1[wh.dst, 2]),
2 ))
return(ln)
})
#lines, second shortest distance
lin2 <- sapply(2:(length(wh)+1), function (x) {
p1 <- matrix(unlist(overl.na[-1,(x-1)]),,2)
pol.pos <- unlist(overl.na[1,])
pol1 <- slot(slot((slot(polys, 'polygons'))[[pol.pos[x-1]]], 'Polygons')
[[1]], 'coords')[-1,]
dst <- spDistsN1(pol1, p1, longlat=FALSE)
wh.dst <- which.min(dst)
dst[wh.dst] <- NA
wh.dst2 <- which.min(dst)
ln2 <- Line(matrix( c(p1[,1], pol1[wh.dst2, 1], p1[,2], pol1[wh.dst2, 2]),
2 ))
return(ln2)
})
#Convert to spatial lines
sl1 <- Lines(lin)
spl1 <- SpatialLines(list(sl1), proj4string = CRS(proj4string(polys)))
sl2 <- Lines(lin2)
spl2 <- SpatialLines(list(sl2), proj4string = CRS(proj4string(polys)))
##
#Add the new values to the overlay
overl[wh] <- unlist(overl.na[1,])
#Show overlay
overl
#Add polygon
new.points <- SpatialPointsDataFrame(ssample, data.frame(polyid=overl))
polys <- SpatialPolygonsDataFrame(polys,
data.frame(ID=c("p1", "p2", "p3", "p4")))
mypalette <- "red"
require(lattice)
trellis.par.set(sp.theme(regions = list(col = mypalette)))
spplot(new.points,
sp.layout = list(list("sp.polygons", polys, fill =
c("grey10", "grey40", "grey60", "grey90")),
list("sp.lines", spl1),
list("sp.lines", spl2)),
xlim=(c(bbox(polys)[1,])-c(mean( slot(ssample, 'coords'))*.5, -mean(
slot(ssample, 'coords'))*.5)),
ylim=(c(bbox(polys)[2,])-c(mean( slot(ssample, 'coords'))*.5, -mean(
slot(ssample, 'coords'))*.5)))
##########
On Friday 22 May 2009 10:44:38 am Roger Bivand wrote:
> On Thu, 21 May 2009, Pilar Tugores Ferra wrote:
> > Dear Joe,
> > Maybe you could use a similar process that Marcelino suggested to me
> > yesterday in order to compute the shortest distance from points to a
> > polyline. You need to 1)convert your point data to ppp object in
> > spatstat, 2)coerce your polygon data to a psp object in spatstat.
> > 3)nncross between the two.
> > You'll get "dist" - the nearest neighbour distance between the two
> > patterns and "which" the nearest neighbour index of the second pattern.
> > Maybe it won't be so straightforward to know which ids of the coerced
> > polyline correspond to each polygon but I am not sure about this point.
>
> Thanks, Pilar!
>
> The only possible weakness in this might be that the input points and
> polygon boundaries are in geographical, not projected (planar)
> coordinates, so if there are differences in distances on the plane and
> great circle distances, they might lead to the "outside" points being
> assigned to the wrong county.
>
> Depending on how many there are, you might consider two alternatives, one
> visual inspection in Google Earth or similar (export the "outside" points
> with IDs as one KML, and the county boundaries you are using with
> writeOGR() in rgdal (or use those built into GE)); the other would be to
> use the closeness to the label points of the counties of the "outside"
> points - coordinates() of the SpatialPolygons* object retrieves these, so
> looping over the "outside" points in spDistsN1() in sp with longlat=TRUE
> would give the great circle distances.
>
> Of course, the underlying issue is why they are outside - is this a datum
> shift problem (counties in NAD27 and points in NAD83/WGS84)? If you fixed
> that, they would match better.
>
> Hope this helps,
>
> Roger
>
> > Best regards,
> > Pilar
> >
> >
> > -----Mensaje original-----
> > De: r-sig-geo-bounces at stat.math.ethz.ch en nombre de BRWIN338 at aol.com
> > Enviado el: jue 21/05/2009 21:14
> > Para: r-sig-geo at stat.math.ethz.ch
> > Asunto: [R-sig-Geo] Finding the county shapefile polygon closest to
> > along-lat position
> >
> > Greetings
> > I have a large number of long-lat locations dispersed over the US and
> > need
> >
> > to identify which US county that each point is located in (or nearest
> > to).
> >
> > After reading the past posts and Roger's book, I have been able to use
> > the overlay function
> > to identify the appropriate counties for the set of locations
> > with long-lats lying within or on a polygon boundary. However, due to
> > longlat precision errors (I am assuming), some of the points lie
> > outside all of
> > my shapefile's county polygon boundaries.
> >
> > Is there an R function similar to "overlay" that I could use to find
> > which
> >
> > county polygon is closest to each of my longlat points that do not lie
> > within the shapefile's polygons? I have spent quite a bit of time
> > searching
> > and browsing past list discussions and can seem to find my answer.
> >
> > My apologies if I have missed an obvious answer.
> >
> > Joe
> >
> >
> > **************Huge savings on HDTVs from Dell.com!
> > (http://pr.atwola.com/promoclk/100126575x1221836042x1201399880/aol?redir=
> >http:%2F%2Fad.doubleclick.ne t%2Fclk%3B215073686%3B37034322%3Bb)
> >
> > [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > R-sig-Geo at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
> >
> >
> > La informaci?n contenida en este e-mail y sus ficheros adjuntos es
> > totalmente confidencial y no deber?a ser usado si no fuera usted alguno
> > de los destinatarios. Si ha recibido este e-mail por error, por favor
> > avise al remitente y b?rrelo de su buz?n o de cualquier otro medio de
> > almacenamiento. This email is confidential and should not be used by
> > anyone who is not the original intended recipient. If you have received
> > this e-mail in error please inform the sender and delete it from your
> > mailbox or any other storage mechanism. [[alternative HTML version
> > deleted]]
More information about the R-sig-Geo
mailing list