[R-sig-Geo] point.in.polygon() on massive datasets

Markus Loecher loecher at eden.rutgers.edu
Thu Dec 13 16:23:35 CET 2007


Dear all,
I have a dataset of about 50 million lat/lon coordinates each of which falls into one of 550 polygons.
I need to assign their memberships and have used point.in.polygon() for that purpose.
However, the simple way of looping over the 50 million points clearly takes a looong time; 1 million points took about 3-4 days on a fast Linux server with lots of memory.
Am I overlooking obvious ways of making this massive computation more efficient ? Would R trees help ?
Should I try to compile the C code for point.in.polygon() (available from gstat) and run it outside R as a standalone executable ?
I am already using apply() to mitigate the inefficiency of the for loop in R.

Any help would be greatly appreciated,

Thanks,

Markus




More information about the R-sig-Geo mailing list