[R-sig-Geo] knearneigh() computing efficiency on large datasets

Edzer Pebesma edzer.pebesma at uni-muenster.de
Mon Feb 1 08:39:06 CET 2010



Pierre Roudier wrote:
> I was wondering if it could be a solution to use RSAGA or spgrass6 to
> do that. Feedback, anyone ?
>   
PostGIS uses a spatial index; if you provide it with a bounding box, it 
should be able to select features (points/lines/polygons) within it. I 
don't know how this relates to grids, and  haven't tried to exploit 
this. rgdal with postgis support seems to be hard to get to work on 
windows; another interface could be RODBC.
> Cheers,
>
> Pierre
>
>
>
> 2010/1/29 Edzer Pebesma <edzer.pebesma at uni-muenster.de>:
>   
>> Pierre Roudier wrote:
>>     
>>> Dear list,
>>>
>>> I am processing gridded data like DEMs, imported in R from geotiff
>>> files. Of course, such data is regularly gridded, but very often with
>>> NA values. In the framwork of some tests, I have to generate a
>>> neighbourhood of the DEM, so that to extract local extrema of the
>>> layer. For this task, I wrote a function based on  the knearneig()
>>> function from the spdep package, with k=4 or k=8, to generate and
>>> analyse the neighbourhood of each point.
>>>
>>> Unfortunately, I often have to process big datasets (let's say grids
>>> with between  50,000 and 350,000 points), and that's working but
>>> knearneigh() takes *hours* to process.
>>>
>>> Does anybody would have any suggestion to improve the efficiency of this
>>> step ?
>>>
>>>       
>> the algorithm (knn.c file in the spdep sources) seems to compute all
>> distances between all grid cells, and for each grid cell find the nearest k
>> by some kind of insertion sort. That makes it general for all kinds of
>> spatial data points, including irregularly spread ones, but makes it also
>> inefficient (because O(n^2) with n the number of grid cells) for large data
>> sets.
>>
>> Suggestions to improve this would be to use an index structure (for the
>> generic case of possibly irregularly spread data) such as the PR-bucket
>> quadtree used in package gstat (nsearch.c), or specifically for gridded data
>> to write a special knearneigh() that exploits the grid indexing already
>> present: you can explicitly address the nearest 4/8/12/... neighbours by
>> using row/col indexes. In this latter case you'd probably want to rewrite
>> the test function for the full grid, instead of looping over the cells.
>>     
>>> Thanks,
>>>
>>> Pierre
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>       
>> --
>> Edzer Pebesma
>> Institute for Geoinformatics (ifgi), University of Münster Weseler Straße
>> 253, 48151 Münster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763
>>  http://ifgi.uni-muenster.de http://www.52north.org/geostatistics
>>  e.pebesma at wwu.de
>>
>>
>>     

-- 
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster 
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251 
8333081, Fax: +49 251 8339763  http://ifgi.uni-muenster.de 
http://www.52north.org/geostatistics      e.pebesma at wwu.de



More information about the R-sig-Geo mailing list