[R] findInterval(), binary search, log(N) complexity

Duncan Murdoch murdoch at stats.uwo.ca
Mon Sep 22 22:54:14 CEST 2008


On 9/22/2008 1:51 PM, Markus Loecher wrote:
> Dear R users,
> the help for findInterval(x,vec) suggests a logarithmic dependence on N
> (=length(vec)), which would imply a binary search type algorithm.
> However, when I "test" this hypothesis, in the following manner:

R is open source.  Why test things this way, when you can look at the 
source?  You don't even need to go to C code for this:

 > findInterval
function (x, vec, rightmost.closed = FALSE, all.inside = FALSE)
{
     if (any(is.na(vec)))
         stop("'vec' contains NAs")
     if (is.unsorted(vec))
         stop("'vec' must be sorted non-decreasingly")
     if (has.na <- any(ix <- is.na(x)))
         x <- x[!ix]
     nx <- length(x)
     index <- integer(nx)
     .C("find_interv_vec", xt = as.double(vec), n = 
as.integer(length(vec)),
         x = as.double(x), nx = as.integer(nx), 
as.logical(rightmost.closed),
         as.logical(all.inside), index, DUP = FALSE, NAOK = TRUE,
         PACKAGE = "base")
     if (has.na) {
         ii <- as.integer(ix)
         ii[ix] <- NA
         ii[!ix] <- index
         ii
     }
     else index
}
<environment: namespace:base>

Notice the "is.unsorted" test.  How could that be anything other than 
linear execution time in N? Similarly for any(ix <- is.na(x)).

If you know the answers to those tests (as you do in your simulation), 
you could presumably get O(log(n)) behaviour by writing a new function 
that skipped them.  But you could take a look at the source code (in 
https://svn.r-project.org/R/trunk/src/appl/interv.c) if you want to 
check, or if you notice any weird timings.

Duncan Murdoch


> 
> set.seed(-3645);
> l <- vector();
> N.seq <- c(5000, 500000, 1000000, 10000000, 50000000);k <- 1
> for (N in N.seq){
>   tmp <- sort(round(stats::rt(N, df=2), 2));
>   l[k] <- system.time(it3 <- findInterval(-1, tmp))[2];k <- k + 1;
> }
> plot(N.seq,l,type="b",xlab="length(vec)", ylab="CPU time");
> 
> the resulting plot suggests a linear relationship.
> I must be missing sth. here ?
> 
> Thanks !
> 
> Markus
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list