[R] Nearest Neighbor Algorithm in R -- again.

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Feb 2 23:29:38 CET 2004


Why not modify the C code of knn? (Not knn1)

On Mon, 2 Feb 2004, Hans W Borchers wrote:

> Several of the methods I use for analyzing large data sets, such as
> 
>   WinGamma: determining the level of noise in data
>   Relief-F: estimating the influence of variables
> 
> depend on finding the k nearest neighbors of a point in a data frame or
> matrix efficiently. (For large data sets it is not feasible to compute
> the 'dist' matrix anyway.)
> 
> Seeing the proposed solution to "[R] distance between two matrices"
> last month for finding _one_ nearest neighbor I came up with a solution
> 'nearest(A, n, k)' as appended.
> 
> Still, this is very clumsy and slow if you have to find the 3 nearest
> neighbors for 1000 points in a data frame with 100000 entries at least
> -- about 2 secs per data point on my computer or half an hour for an
> application from real life.
> 
> Are there better/faster ways to perform this task using R functions?
> Even better, is there a free implementation of kd-trees that I could
> utilize (the one I found did not conform to the ANSI C standard)?
> 
> Someome pointed to 'spdep' of the R-Sig-Geo project, but 'knearneigh'
> only accepts 2D data points.
> 
> Hans Werner Borchers
> ABB Corporate Research, Germany
> ________________________________________________________________________
> 
> require (class)
> nearest <- function (X, n, k=3)
> ##  Find k nearest neighbors of X[n, ] in the data frame
> ##  or matrix X, utilizing function knn1 k-times.
> {
>     N <- nrow(X)
>     # inds contains the indices of nearest neighbors
>     inds <- c(n); i <- 0
>     while (i < k) {
>         # use knn1 for one index...
>         j  <- as.integer(knn1(X [-inds, ], X[n, ], 1:(N-length(inds))))
>         # ...and change to true index of neighbor
>         inds <- c(inds, setdiff(1:N, inds)[j])
>         i <- i+1
>     }
>     # return nearest neighbor indices (without n, of course)
>     return(inds[-1])
> }
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list