[R-sig-Geo] Finding polygons corresponding to points

Roger Bivand Roger.Bivand at nhh.no
Mon Nov 9 13:27:16 CET 2009

On Mon, 9 Nov 2009, Karl Ove Hufthammer wrote:

> Dear list members
>
> I have a collection of polygons (a SpatialPolygonsDataFrame object, but
> I can easily convert it to different formats), and a collection of
> coordinate points. Each point corresponds to (at most) one polygon.
>
> For each point, I need to find the corresponding polygon. You can think
> of the points as positions of for example earthquakes, and the polygons
> as state or country borders. For each earthquake, I need to calculate
> the state/country where the earthquake occured.
>
> There are a number of functions to check if a point is inside a given
> polygon, e.g., point.in.polygon, and I *could* write a double loop that
> for each point checks each polygon. But as I have very many points (tens
> of thousands) and hundreds of polygons, this may be quite slow.
>
> Is there an easier and more efficient way of / existing function for
> doing this? As each point is only inside (at most) one polygon, and the
> number of points inside each polygon varies a great deal (for example,
> most earthquakes occur in California), there clearly is much room for
> optimisation.

Try the relevant overlay() methods in sp. They ought to be fast enough:

library(maptools)
data(wrld_simpl)
class(wrld_simpl)
length(row.names(wrld_simpl))
system.time(pts <- spsample(wrld_simpl, n=20000, type="random"))
class(pts)
system.time(o <- overlay(pts, wrld_simpl))
range(o)

The o vector says which Polygons object the points fall, 1 isn't included
here as it didn't take any hits in spsample (Antigua and Barbuda).
Generating the point samples also involves a lot of point-in-polygon too,
so making the example takes twice as long for me (10s) than doing the
overlay (4s).

Roger

>
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of