[R] Vectorizing closest match

Thomas Lumley tlumley at u.washington.edu
Thu Mar 28 17:54:50 CET 2002

```On Thu, 28 Mar 2002, Frank E Harrell Jr wrote:

> If anyone has a very fast vectorized method for doing the following I
> would appreciate some help.  I want to avoid outer() to limit memory
> problems for very large n.
>
> Let
>
> x = real vector of length n
> y = real vector of length n
> w = real vector of length m, m typically less than n/2 but can be > n
> z = real vector of length m
>
> For w[i], i=1,,,m, find the value of x that is closest to w[i].  In the
> case of ties, select one (optimally at random or just take the first
> match).  Let z[i] = value of y corresponding to the closest x.
>

I believe the following will work in pre1.5.0 using the new findInterval
function (modulo any remaining off-by-one errors that should be easy to
fix)

# sort them
oo<-order(x)
x<-x[oo]
y<-y[oo]
# find the right interval for w
j<-pmax(1,findInterval(w,x))
xlow<-x[j] #or possibly j-1 and j or something
xhi<-x[j+1]
# which end of the interval?
bestj<-ifelse(xhi-w<w-xlow,j+1,j)
# get z
z<-y[j]

For length(x)=10^5 it takes 0.5s with length(w)=length(x)/3 and 2.5s with
length(w)=length(x)*3 on my machine with a current pre1.5.0

-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```