[R] General binary search?

William Dunlap wdunlap at tibco.com
Mon Apr 4 22:50:28 CEST 2011


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Stavros Macrakis
> Sent: Monday, April 04, 2011 1:15 PM
> To: r-help
> Subject: [R] General binary search?
> 
> Is there a generic binary search routine in a standard library which
> 
>    a) works for character vectors
>    b) runs in O(log(N)) time?
> 
> I'm aware of findInterval(x,vec), but it is restricted to 
> numeric vectors.

xtfrm(x) will convert a character (or other) vector to
a numeric vector with the same ordering.  findInterval
can work on that.  E.g., 
   > f0 <- function(x, vec) {
       tmp <- xtfrm(c(x, vec))
       findInterval(tmp[seq_along(x)], tmp[-seq_along(x)])
     }
   > f0(c("Baby", "Aunt", "Dog"), LETTERS)
   [1] 2 1 4
I've never looked at its speed.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> I'm also aware of various hashing solutions (e.g. 
> new.env(hash=TRUE) and
> fastmatch), but I need the greatest-lower-bound match in my 
> application.
> 
> findInterval is also slow for large N=length(vec) because of the O(N)
> checking it does, as Duncan Murdoch has pointed
> out<https://stat.ethz.ch/pipermail/r-help/2008-September/174584.html>:
> though
> its documentation says it runs in O(n * log(N)), it actually 
> runs in O(n *
> log(N) + N), which is quite noticeable for largish N.  But 
> that is easy
> enough to work around by writing a variant of findInterval which calls
> find_interv_vec without checking.
> 
>             -s
> 
> PS Yes, binary search is a one-liner in R, but I always prefer to use
> standard, fast native libraries when possible....
> 
> binarysearch <- function(val,tab,L,H) {while (H>=L) { 
> M=L+(H-L) %/% 2; if
> (tab[M]>val) H<-M-1 else if (tab[M]<val) L<-M+1 else return(M)};
> return(L-1)}
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list