[R-sig-Geo] Aggregating points based on distance

Barry Rowlingson b@row||ng@on @end|ng |rom gm@||@com
Wed Mar 13 19:33:53 CET 2019


On Wed, Mar 13, 2019 at 6:14 PM Andy Bunn <bunna using wwu.edu> wrote:

> I would like to create averages of all the variables in a
> SpatialPointsDataFrame when points are within a specified distance of each
> other. I have a method for doing this but it seems like a silly way to
> approach the problem. Any ideas for doing this using modern syntax
> (especially of the tidy variety) would be appreciated.
>
>
> To start, I have a SpatialPointsDataFrame with several variables measured
> for each point. I'd like to get an average value for each variable for
> points within a specified distance. E.g., getting average cadmium values
> from the meuse data for points within 100 m of each other:
>
>     library(sf)
>     library(sp)
>     data(meuse)
>     pts <- st_as_sf(meuse, coords = c("x", "y"), remove=FALSE)
>     pts100 <- st_is_within_distance(pts, dist = 100)
>     # can use sapply to get mean of a variable. E.g., cadmium
>     sapply(pts100, function(x){ mean(pts$cadmium[x]) })
>
>
If this is the method you call "silly" then I don't see anything silly at
all here, only efficient well-written use of base R constructs. The problem
with "modern" syntax is that its subject to rapid change and often slower
than using base R, which has had years to stabilise and optimise.

If you want to iterate this over variables then nest your sapplys:

items = c("cadmium", "copper","lead")
sapply(items, function(item){
 sapply(pts100, function(x){ mean(pts[[item]][x]) })
})

gets you:

         cadmium    copper      lead
  [1,] 10.150000  83.00000 288.00000
  [2,] 10.150000  83.00000 288.00000
  [3,]  6.500000  68.00000 199.00000
  [4,]  2.600000  81.00000 116.00000


Barry


> Above, I've figured out how to use sapply to do this variable by variable.
> So I could, if I wanted, calculate the mean for each variable, generate a
> centroid for each point and then a SpatialPointsDataFrame of the unique
> values. E.g., for the first few variables:
>
>     res <- data.frame(id=1:length(pts100),
>                       x=NA, y=NA,
>                       cadmium=NA, copper=NA, lead=NA)
>     res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
>     res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
>     res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
>     res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
>     res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
>     res2 <- res[duplicated(res$cadmium),]
>     coordinates(res2) <- c("x","y")
>     bubble(res2,"cadmium")
>
>
> This works but seems cumbersome and like there must be a more efficient
> way.
>
>
> Thanks for any help, Andy
>
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

	[[alternative HTML version deleted]]



More information about the R-sig-Geo mailing list