[R-sig-Geo] point.in.polygon() on massive datasets

Edzer Pebesma edzer.pebesma at uni-muenster.de
Thu Dec 13 17:40:10 CET 2007


overlay for points in polygons is a wrapper around point.in.polygon, the 
same that is used in gstat. What I miss in the R code (and C source 
code) is e.g. the checking on the bbox, whether the point is within the 
bounding box of a polygon, although the bbox is computed.

It would of course speed things up increadibly when using tree indexes, 
either on the points or on the polygons, or preferably both, but this is 
currently not in the sp code. A nice student project!
--
Edzer

Barry Rowlingson schrieb:
> Markus Loecher wrote:
>   
>> Dear all,
>> I have a dataset of about 50 million lat/lon coordinates each of which falls into one of 550 polygons.
>> I need to assign their memberships and have used point.in.polygon() for that purpose.
>> However, the simple way of looping over the 50 million points clearly takes a looong time; 1 million points took about 3-4 days on a fast Linux server with lots of memory.
>> Am I overlooking obvious ways of making this massive computation more efficient ? Would R trees help ?
>> Should I try to compile the C code for point.in.polygon() (available from gstat) and run it outside R as a standalone executable ?
>> I am already using apply() to mitigate the inefficiency of the for loop in R.
>>
>> Any help would be greatly appreciated,
>>
>>     
>
>   Have you tried the 'overlay' functions from the sp package? Overlaying 
> points on polygons using those checks all the polygons for each point in 
> one go, so it may do some spatial tree optimising... You might have to 
> do your 50 million points in batches though...
>
> Barry
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>




More information about the R-sig-Geo mailing list