[R] distance between two matrices

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Jan 28 16:08:47 CET 2004


On Wed, 28 Jan 2004, "Hüsing, Johannes" wrote:

> > Hi all,
> >    Say I have a matrix A with dimension m x 2 and matrix B with 
> > dimension n x 2. I would like to find the row in A that is closest to 
> > the each row in B. Here's an example (using a loop):
> > 
> > set.seed(1)
> > A <- matrix(runif(12), 6, 2) # 6 x 2
> > B <- matrix(runif(6), 3, 2)  # 3 x 2
> > m <- vector("numeric", nrow(B))
> 
> make the lines below a function of a vector argument and 
> apply it over the rows of B.
> 
> ?apply for more info. You'll want to know about apply if
> you want to avoid loops (which is a good approach).

Unfortunately apply() is a wrapper for a for() loop, so will not help much 
(if at all).

> > for(j in 1:nrow(B)) {
> >    d <- (A[, 1] - B[j, 1])^2 + (A[, 2] - B[j, 2])^2
> >    m[j] <- which.min(d)
> > }

You can improve this a bit: see predict.qda.

> > All I need is m[]. I would like to accomplish this without using the 
> > loop if possible, since for my real data n > 140K and m > 1K. I hope 
> > this makes sense.
> 
> Thing is, the above approach requires all data to be in main memory.
> i hope this is not a problem.

A 140K x 2 array takes up 1.6Mb, and R needs 10x that to run at all.

Several people have mentioned knn1 as a C-level equivalent of the loops
(and I timed it as probably fast enough).  Roger Bivand mentioned
quadtrees, and that is one of a class of possible solutions if you need
extra speed.  Which member of that class is suitable depends on the
spatial distribution of A and B (viewing the rows as 2D points), but it is
hard to do very much better for only around a 1000 reference points.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list