[R-sig-Geo] Identifying which points are in which cluster
macqueen1 at llnl.gov
Thu Sep 25 20:25:33 CEST 2014
In the following reproducible example I have created a
SpatialPointsDataFrame with three clusters of points. What I¹m looking for
is a (good) way to add to the SPDF a column which identifies which cluster
each point is in.
‹‹ begin example ---
## construct at SpatialPointsDataFrame with three "clusters"
ctrs <- cbind(x=c( 7000, 8000, 9000),
y=c(12000, 13000, 14000))
pts <- cbind(rep(ctrs[,1],3)+runif(9,-20,20),
rep(ctrs[,2],3) + runif(9, -20,20))
pts <- SpatialPointsDataFrame(pts, data.frame(name=letters[1:9]) ,
## now pretend I don't which points are in which cluster
tmp1 <- gBuffer(pts, width=30, byid=TRUE)
tmp2 <- gUnaryUnion(tmp1)
## tmp2 is now a SpatialPolygons object
## with three Polygons, one for each "cluster"
points(ctrs, col='red', pch=3, cex=0.5)
‹ end example ‹
Now I need a way to identify which point is in which polygon, and add a
variable to the data frame slot of pts with that information.
I¹m sure I can work it out by digging into the structure of tmp2 and
pulling out the individual polygons using a loop, but I¹m hoping there¹s a
higher level solution, possibly using some combination of lapply() or
sapply() with over(). But I have not been able to come up with it.
By the way, searching through old r-sig-geo emails, I found
tmp.cc <- hclust(dist(coordinates(pts)), "complete")
tmp.50 <- cutree(tmp.cc, h=50)
and this works for this example (thanks to marcelino.delacruz at upm.es), but
it¹s not clear to me which approach will be better in the long run for my
And I see something could be done with spdep:dnearneigh(), but the output
structure is a little complex and I don¹t understand it well enough.
So I would still appreciate suggestions for a solution based on points in
polygons and the data structures in the example.
Thanks very much
Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
More information about the R-sig-Geo