[R] effective way to return only the first argument of "which()"

Rui Barradas ruipbarradas at sapo.pt
Fri Sep 21 20:56:23 CEST 2012


Hello,

Since no one took the bait, I've written some C code and run some tests.
The tests are divided in two parts, one uses a random vector 'x', the 
other makes sure the condition will only be met at the end.v See below. 
But first

1. Like Bert said, now your data. I've tested Milan's code, not his, and 
the results are clearly better than which() and which.max for the test 
case 1 but the worst of all for test case 2.

2. which.max can be potentially dangerous. If the condition is never met 
it returns 1, the first max of a sequence of zeros...
This is not what is intended and unlike which() which returns NA. In 
this case Milan's returns NULL and my C functions NA.

3. Like Michael's Rcpp code, my first function searches for TRUE in a 
logical vector, the others use his suggestion to hard code the 
condition. The latter is by far the fastest of all (expected). What 
surprised me was to see the former be comparable to which(). Run in a 
clean R session I had speedups of 10-20%. With several of packages 
loaded, a bit more, 50%. Still very modest and not worth the time spent 
in programming and testing an external routine. I was really well 
impressed with which().

General conclusion: don't speak before knowing.

And now the code.


dynLoad <- function(lib)
     dyn.load(paste(lib, .Platform$dynlib.ext, sep = ""))
dynUnload <- function(lib)
     dyn.unload(paste(lib, .Platform$dynlib.ext, sep = ""))

dynLoad("firstCond")

firstTrue <- function(x) .Call("firstTrue", as.logical(x))
firstGreater <- function(x, a){
     if(is.integer(x))
         .Call("firstGreaterInt", x, as.integer(a))
     else if(is.numeric(x))
         .Call("firstGreaterDbl", x, as.double(a))
     else
         stop("'x' not numeric.")
}

firstGreater(y, b)

find <- function(x, lim) {
     len <- length(x)
     for(n in 2^(14:0)) {
         val <- which(x[seq.int(1L, len/n)] > lim)
         if(length(val) > 0) return(val[1])
     }
     return(NULL)
}
findc <- compiler::cmpfun(find)

f1 <- function(x, a) which(x > a)[1]
f2 <- function(x, a) which.max(x > a)
f3 <- function(x, a) find(x, a)
f4 <- function(x, a) findc(x, a)
f5 <- function(x, a) firstTrue(x > a)
f6 <- function(x, a) firstGreater(x, a)

# ---- Test case 1 ----
set.seed(6824)
x <-runif(1e7)
a <- 0.999

n <- 100L
t1 <- system.time(for(i in 1:n) r1 <- f1(x, a))
t2 <- system.time(for(i in 1:n) r2 <- f2(x, a))
t3 <- system.time(for(i in 1:n) r3 <- f3(x, a))
t4 <- system.time(for(i in 1:n) r4 <- f4(x, a))
t5 <- system.time(for(i in 1:n) r5 <- f5(x, a))
t6 <- system.time(for(i in 1:n) r6 <- f6(x, a))

c(r1, r2, r3, r4, r5, r6) # equal results?
rbind(t1, t2, t3, t4, t5, t6) # Times

# Tell which of f4 and f6 is faster
n <- 100000L
t4 <- system.time(for(i in 1:n) r4 <- f4(x, a))
t6 <- system.time(for(i in 1:n) r6 <- f6(x, a))
rbind(t4, t6) # Times

# ---- Test case 2 ----
y <- 1:1e7
b <- 10000000L - 1L

n <- 100L
t1 <- system.time(for(i in 1:n) r1 <- f1(y, b))
t2 <- system.time(for(i in 1:n) r2 <- f2(y, b))
t3 <- system.time(for(i in 1:n) r3 <- f3(y, b))
t4 <- system.time(for(i in 1:n) r4 <- f4(y, b))
t5 <- system.time(for(i in 1:n) r5 <- f5(y, b))
t6 <- system.time(for(i in 1:n) r6 <- f6(y, b))

c(r1, r2, r3, r4, r5, r6) # equal results?
rbind(t1, t2, t3, t4, t5, t6) # Times


#================= C code =========================
// file: firstCond.c
// compile: R CMD SHLIB firstCond.c

#include <R.h>
#include <Rinternals.h>

SEXP firstTrue(SEXP X){
     R_len_t *which, i = 0, *x = LOGICAL(X);
     SEXP Which;

     PROTECT(Which = allocVector(INTSXP, 1));
     which = INTEGER(Which);

     while(x[i] == 0) i++;
     if(x[i] == 1)
         *which = ++i;
     else
         *which = NA_INTEGER;

     UNPROTECT(1);
     return(Which);
}

SEXP firstGreaterDbl(SEXP X, SEXP A){
     R_len_t *which, i = 0;
     double *x = REAL(X), *a = REAL(A);
     SEXP Which;

     PROTECT(Which = allocVector(INTSXP, 1));
     which = INTEGER(Which);

     while(x[i] <= a[0]) i++;
     if(x[i] > a[0])
         *which = ++i;
     else
         *which = NA_INTEGER;

     UNPROTECT(1);
     return(Which);
}


firstGreaterInt is exactly the same, substituting  'double' for 
'R_len_t' and INTEGER() for REAL()

Rui Barradas

Em 20-09-2012 22:33, R. Michael Weylandt escreveu:
> On Thu, Sep 20, 2012 at 4:39 PM, Mike Spam <ichmagspam at googlemail.com> wrote:
>> Thank you very much, especially Milan and Bert!
>> I will do some speedtests and fit the function to my needs.
>>
>> I think the best way would be a modified function in C...
>> But i am not familiar enough with C. Perhaps this would be a simple
>> but useful extension.
>> If someone has a solution, i would appreciate a post in this mailing list.
>>
>> Cheers and thanks to all,
>> Nico
>>
> 5 hours and Dirk hasn't taken the bait? I suppose I'll give it a try,
> though my Rcpp-fu is not great:
>
> #####################
>
> library(inline)
> library(Rcpp)
>
> ## which.first() for R
> which.first <- cxxfunction(signature(x = 'logical'), '
>    NumericVector xx(x);
>     // Rcpp magic which makes an integer vector xx from the SEXP x
>    int i = 0;
>
>    while(xx[i] != 1){
>      i++;
>    }
>    return(wrap(i + 1)); // Remember, c++ indices are 0-based
> ', plugin = "Rcpp")
>
> ######################
>
> This gives the first value for which x is true. If it's a specific
> condition you are evaluating, you could push that logic down to C++
> and put it inside the while loop as well to save time there but your
> original post didn't say what the logic was.
>
> Note that using this will require a working R development environment,
> which is harder on some systems than others.
>
> Cheers,
> Michael
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list