[R-sig-Geo] Running huge dataset with dnearneigh

Sat Jun 29 20:29:41 CEST 2019

>
> Date: Sat, 29 Jun 2019 00:36:22 +0100
> From: Jiawen Ng <lovelylittledaisies using gmail.com>
> To: r-sig-geo using r-project.org
> Subject: [R-sig-Geo] Running huge dataset with dnearneigh
>
> How can we deal with a huge dataset when using dnearneigh?
>
> Here is my code:
>
> d <- dnearneigh(spdf,0, 22000)
> all_listw <- nb2listw(d, style = "W")
>
> where the spdf object is in the british national grid CRS:
> +init=epsg:27700, with 227,973 observations/points. The distance of 22,000
> was decided by a training set that had 214 observations and the spdf object
> contains both the training set and the testing set.

I have had good results using the rtree package to compute nearest
neighbors. It is very fast with relatively low memory requirements. I have
not tried it with so many points but it works well up to 10,000 or so. If I
understand the dnearneigh docs, the rtree::withinDistance function is
similar.
https://github.com/hunzikp/rtree

Kent Johnson

	[[alternative HTML version deleted]]