[R-sig-Geo] knearneigh() computing efficiency on large datasets
edzer.pebesma at uni-muenster.de
Fri Jan 29 08:45:38 CET 2010
Pierre Roudier wrote:
> Dear list,
> I am processing gridded data like DEMs, imported in R from geotiff
> files. Of course, such data is regularly gridded, but very often with
> NA values. In the framwork of some tests, I have to generate a
> neighbourhood of the DEM, so that to extract local extrema of the
> layer. For this task, I wrote a function based on the knearneig()
> function from the spdep package, with k=4 or k=8, to generate and
> analyse the neighbourhood of each point.
> Unfortunately, I often have to process big datasets (let's say grids
> with between 50,000 and 350,000 points), and that's working but
> knearneigh() takes *hours* to process.
> Does anybody would have any suggestion to improve the efficiency of this step ?
the algorithm (knn.c file in the spdep sources) seems to compute all
distances between all grid cells, and for each grid cell find the
nearest k by some kind of insertion sort. That makes it general for all
kinds of spatial data points, including irregularly spread ones, but
makes it also inefficient (because O(n^2) with n the number of grid
cells) for large data sets.
Suggestions to improve this would be to use an index structure (for the
generic case of possibly irregularly spread data) such as the PR-bucket
quadtree used in package gstat (nsearch.c), or specifically for gridded
data to write a special knearneigh() that exploits the grid indexing
already present: you can explicitly address the nearest 4/8/12/...
neighbours by using row/col indexes. In this latter case you'd probably
want to rewrite the test function for the full grid, instead of looping
over the cells.
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de
http://www.52north.org/geostatistics e.pebesma at wwu.de
More information about the R-sig-Geo