[R] Summary: Vectorizing closest match
Frank E Harrell Jr
fharrell at virginia.edu
Thu Mar 28 23:54:40 CET 2002
The original problem I posed was
Let
x = real vector of length n
y = real vector of length n
w = real vector of length m, m typically less than n/2 but can be > n
z = real vector of length m
For w[i], i=1,,,m, find the value of x that is closest to w[i]. In the
case of ties, select one (optimally at random or just take the first
match). Let z[i] = value of y corresponding to the closest x.
I received several helpful replies. Peter Dalgaard suggested the use of approx. approx always amazes me. To return the index of the closest match you can use round(approx(x,1:length(x),xout=w,rule=2,ties='ordered')$y)
For S-Plus just remove ties=.
David James suggested using cut. I adapted his code and speeded it up as he suggested, by going right to the .C call in cut.default (set global variable .R. to TRUE for R, FALSE for S-Plus):
whichClosest <- function(x, w) {
## x: vector of reference values
## w: vector of values to find closest matches in x
## Returns: subscripts in x corresponding to w
i <- order(x)
x <- x[i]
n <- length(x)
br <- c(-1e30, x[-n]+diff(x)/2,1e30)
m <- length(w)
if(.R.) i[.C("bincode", as.double(w), m, as.double(br),
length(br), code = integer(m), right = TRUE,
include = FALSE, NAOK = TRUE, DUP = FALSE,
PACKAGE = "base")$code] else
i[.C("S_binning3", x=as.double(w), m, as.double(br),
length(br), 0, 0, TRUE, TRUE)$x] # For S-Plus
}
Note that for large n, cut.default is extremely slow.
Thomas Lumley had a nice approach using a new function findInterval. All three approaches are extremely fast. The main difference between approx and the cut approach is where ties are shuffled. I plan to use approx. To randomize choices in case of ties one can jitter the x vector.
Thanks again all,
Frank
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list