[R] findInterval(), binary search, log(N) complexity
Duncan Murdoch
murdoch at stats.uwo.ca
Mon Sep 22 22:54:14 CEST 2008
On 9/22/2008 1:51 PM, Markus Loecher wrote:
> Dear R users,
> the help for findInterval(x,vec) suggests a logarithmic dependence on N
> (=length(vec)), which would imply a binary search type algorithm.
> However, when I "test" this hypothesis, in the following manner:
R is open source. Why test things this way, when you can look at the
source? You don't even need to go to C code for this:
> findInterval
function (x, vec, rightmost.closed = FALSE, all.inside = FALSE)
{
if (any(is.na(vec)))
stop("'vec' contains NAs")
if (is.unsorted(vec))
stop("'vec' must be sorted non-decreasingly")
if (has.na <- any(ix <- is.na(x)))
x <- x[!ix]
nx <- length(x)
index <- integer(nx)
.C("find_interv_vec", xt = as.double(vec), n =
as.integer(length(vec)),
x = as.double(x), nx = as.integer(nx),
as.logical(rightmost.closed),
as.logical(all.inside), index, DUP = FALSE, NAOK = TRUE,
PACKAGE = "base")
if (has.na) {
ii <- as.integer(ix)
ii[ix] <- NA
ii[!ix] <- index
ii
}
else index
}
<environment: namespace:base>
Notice the "is.unsorted" test. How could that be anything other than
linear execution time in N? Similarly for any(ix <- is.na(x)).
If you know the answers to those tests (as you do in your simulation),
you could presumably get O(log(n)) behaviour by writing a new function
that skipped them. But you could take a look at the source code (in
https://svn.r-project.org/R/trunk/src/appl/interv.c) if you want to
check, or if you notice any weird timings.
Duncan Murdoch
>
> set.seed(-3645);
> l <- vector();
> N.seq <- c(5000, 500000, 1000000, 10000000, 50000000);k <- 1
> for (N in N.seq){
> tmp <- sort(round(stats::rt(N, df=2), 2));
> l[k] <- system.time(it3 <- findInterval(-1, tmp))[2];k <- k + 1;
> }
> plot(N.seq,l,type="b",xlab="length(vec)", ylab="CPU time");
>
> the resulting plot suggests a linear relationship.
> I must be missing sth. here ?
>
> Thanks !
>
> Markus
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list