# [R] Summary: Vectorizing closest match

Frank E Harrell Jr fharrell at virginia.edu
Thu Mar 28 23:54:40 CET 2002

```The original problem I posed was

Let

x = real vector of length n
y = real vector of length n
w = real vector of length m, m typically less than n/2 but can be > n
z = real vector of length m

For w[i], i=1,,,m, find the value of x that is closest to w[i].  In the
case of ties, select one (optimally at random or just take the first
match).  Let z[i] = value of y corresponding to the closest x.

I received several helpful replies.  Peter Dalgaard suggested the use of approx.  approx always amazes me.  To return the index of the closest match you can use round(approx(x,1:length(x),xout=w,rule=2,ties='ordered')\$y)
For S-Plus just remove ties=.

David James suggested using cut.  I adapted his code and speeded it up as he suggested, by going right to the .C call in cut.default (set global variable .R. to TRUE for R, FALSE for S-Plus):

whichClosest <- function(x, w) {
## x: vector of reference values
## w: vector of values to find closest matches in x
## Returns: subscripts in x corresponding to w
i <- order(x)
x <- x[i]
n <- length(x)
br <- c(-1e30, x[-n]+diff(x)/2,1e30)
m <- length(w)
if(.R.) i[.C("bincode", as.double(w), m, as.double(br),
length(br), code = integer(m), right = TRUE,
include = FALSE, NAOK = TRUE, DUP = FALSE,
PACKAGE = "base")\$code] else
i[.C("S_binning3", x=as.double(w), m, as.double(br),
length(br), 0, 0, TRUE, TRUE)\$x]  # For S-Plus
}

Note that for large n, cut.default is extremely slow.

Thomas Lumley had a nice approach using a new function findInterval.  All three approaches are extremely fast.  The main difference between approx and the cut approach is where ties are shuffled.  I plan to use approx.  To randomize choices in case of ties one can jitter the x vector.

Thanks again all,

Frank
--
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```