[R-sig-Geo] in or out of polygon efficiency question

Roger Bivand Roger.Bivand at nhh.no
Fri Nov 28 20:37:00 CET 2008


On Fri, 28 Nov 2008, Jon Loehrke wrote:

> Greetings,
>
> I would like to assign spatial points in a large dataset to an area defined 
> by complex polygon's.  These are 'statistical reference' areas and are 
> helpful in indexing my data.  Currently I have 40,000 spatial observations 
> (lats and longs) and 42 named polygons.
>
> I've found a way to do this with a for loop and using inout from splancs. 
> This is however quite slow and I am curious if anyone would know of a way to 
> speed this up.

Use the overlay method in the sp package - the polygons must be a 
SpatialPolygons object, the points a SpatialPoints (or their *DataFrame 
subclasses). Both are easy to read in using readOGR in rgdal. Using your 
data:

coordinates(dat) <- c("x", "y")
group1 <- lapply(seq(along=group), function(x)
   Polygons(list(Polygon(as.matrix(group[[x]]))), ID=x))
group2 <- SpatialPolygons(group1)
o <- overlay(group2, dat)
plot(dat)
plot(group2, add=TRUE, border="green")
points(dat, col=o)
dat$area <- o
summary(dat)

If you want the names in group as the IDs, use lapply(names(group), ... 
then:

IDs <- sapply(slot(group2, "polygons"), function(x) slot(x, "ID"))
dat$area <- factor(IDs[o])
summary(dat)

For 40K points in dat, this ran in less than half a second for me, but for 
irregular polygons would take somewhat longer, because there is much more 
computational geometry to do.

Hope this helps,

Roger



>
> Thank you very much for your help and ideas!
>
> ############
> library(splancs)
>
> group<-list(one=data.frame(x=c(-69,-69,-70,-70,-69), 
> y=c(39.8333,39,39,39.8333,39.8333)),
> 			two=data.frame(x=c(-68.8333,-66.6667,-66.6667,-69,-69,-68.8333), 
> y=c(39.8333,39.8333,39,39,39.8333,39.8333)),
> 			three=data.frame(x=c(-65.75,-65.74,-65.6667,-65.6667,66.6667,66.6667,66.6667,-65.75), 
> y=c(40.5,40.5,40.5,39,39,39.8333,40.5,40.5)))
>
> set.seed(50)
>
> dat<-data.frame(x=runif(100,-70,-65), y=runif(100,39,41), area=NA)
>
> for(i in 1:length(dat$x)){
> 	if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else 
> {if(inout(cbind(dat$x[i],dat$y[i]),group$one)){dat$area[i]<-1}}
> 	if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else 
> {if(inout(cbind(dat$x[i],dat$y[i]),group$two)){dat$area[i]<-2}}
> 	if(is.na(dat$x[i]) | is.na(dat$y[i])) {dat$area[i]<-NA} else 
> {if(inout(cbind(dat$x[i],dat$y[i]),group$three)){dat$area[i]<-3}}
> }
>
> dat$area
>
> # [1] NA NA NA NA  2 NA NA NA NA  1 NA  2  2  1  2 NA NA NA NA NA NA  1 NA NA 
> NA
> #[26] NA  2  2  2  2  2  2 NA NA  2  2 NA NA  2 NA NA NA  2  1  1 NA  3 NA NA 
> NA
> #[51]  3 NA  2 NA NA NA NA  2 NA  2  2  2 NA NA NA NA NA  1 NA  3 NA NA  2  1 
> 3
> #[76]  1  2 NA NA NA NA  1 NA NA NA NA NA NA NA  2 NA NA NA NA NA  2  1 NA NA 
> NA
> ######################
>
>
> Jon Loehrke
> Graduate Research Assistant
> Department of Fisheries Oceanography
> School for Marine Science and Technology
> University of Massachusetts
> 200 Mill Road, Suite 325
> Fairhaven, MA 02719
> jloehrke at umassd.edu
>
> sessionInfo()
> R version 2.8.0 Patched (2008-11-24 r47017)
> i386-apple-darwin9.5.0
>
> locale:
> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] splancs_2.01-24 sp_0.9-28
>
> loaded via a namespace (and not attached):
> [1] grid_2.8.0      lattice_0.17-17
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list