[R-sig-Geo] Identifying which points are in which cluster

MacQueen, Don macqueen1 at llnl.gov
Thu Sep 25 20:25:33 CEST 2014


In the following reproducible example I have created a
SpatialPointsDataFrame with three clusters of points. What I¹m looking for
is a (good) way to add to the SPDF a column which identifies which cluster
each point is in.

‹‹ begin example ---

require(sp)
require(rgeos)


## construct at SpatialPointsDataFrame with three "clusters"
ctrs <- cbind(x=c( 7000,  8000,  9000),
              y=c(12000, 13000, 14000))

pts <- cbind(rep(ctrs[,1],3)+runif(9,-20,20),
             rep(ctrs[,2],3) + runif(9, -20,20))

plot(pts)

pts <- SpatialPointsDataFrame(pts, data.frame(name=letters[1:9]) ,
proj4string=CRS('+init=epsg:26943'))

plot(pts)

## now pretend I don't which points are in which cluster
tmp1 <- gBuffer(pts, width=30, byid=TRUE)
tmp2 <- gUnaryUnion(tmp1)

## tmp2 is now a SpatialPolygons object
## with three Polygons, one for each "cluster"

plot(tmp2, usePolypath=FALSE)
plot(pts, add=TRUE)
points(ctrs, col='red', pch=3, cex=0.5)

‹ end example ‹

Now I need a way to identify which point is in which polygon, and add a
variable to the data frame slot of pts with that information.

I¹m sure I can work it out by digging into the structure of tmp2 and
pulling out the individual polygons using a loop, but I¹m hoping there¹s a
higher level solution, possibly using some combination of lapply() or
sapply() with over(). But I have not been able to come up with it.

Thanks
-Don


By the way, searching through old r-sig-geo emails, I found
  tmp.cc <- hclust(dist(coordinates(pts)), "complete")
  tmp.50 <- cutree(tmp.cc, h=50)

and this works for this example (thanks to marcelino.delacruz at upm.es), but
it¹s not clear to me which approach will be better in the long run for my
applications.

And I see something could be done with spdep:dnearneigh(), but the output
structure is a little complex and I don¹t understand it well enough.


So I would still appreciate suggestions for a solution based on points in
polygons and the data structures in the example.

Thanks very much
-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062



More information about the R-sig-Geo mailing list