[R] effective way to return only the first argument of "which()"

Berend Hasselman bhh at xs4all.nl
Wed Sep 19 21:00:47 CEST 2012


On 19-09-2012, at 20:02, Bert Gunter wrote:

> Well, following up on this observation, which can be put under the
> heading of "Sometimes vectorization can be much slower than explicit
> loops" , I offer the following:
> 
> firsti  <-function(x,k)
> {
>  i <- 1
>  while(x[i]<=k){i <- i+1}
>  i
> }
> 
>> system.time(for(i in 1:100)which(x>.99)[1])
>   user  system elapsed
>   19.1     2.4    22.2
> 
>> system.time(for(i in 1:100)which.max(x>.99))
>   user  system elapsed
>  30.45    6.75   37.46
> 
>> system.time(for(i in 1:100)firsti(x,.99))
>   user  system elapsed
>   0.03    0.00    0.03
> 
> ## About a 500 - 1000 fold speedup !
> 
>> firsti(x,.99)
> [1] 122
> 
> It doesn't seem to scale too badly, either (whatever THAT means!):
> (Of course, the which() versions are essentially identical in timing,
> and so are omitted)
> 
>> system.time(for(i in 1:100)firsti(x,.9999))
>   user  system elapsed
>   2.70    0.00    2.72
> 
>> firsti(x,.9999)
> [1] 18200
> 
> Of course, at some point, the explicit looping is awful -- with k =
> .999999, the index was about 360000, and the timing test took 54
> seconds.
> 
> So I guess the point is -- as always -- that the optimal approach
> depends on the nature of the data. Prudence and robustness clearly
> demands the vectorized which() approaches if you have no information.
> But if you do know something about the data, then you can often write
> much faster tailored solutions. Which is hardly revelatory, of course.

And compiling the firsti function can also be quite lucrative!

firsti <- function(x,k)
{
    i <- 1
    while(x[i]<=k){i <- i+1}
    i
}

library(compiler)
firsti.c <- cmpfun(firsti)

> firsti(x,.99)
[1] 93
> firsti.c(x,.99)
[1] 93

> system.time(for(i in 1:100)firsti(x,.99))
   user  system elapsed 
  0.014   0.000   0.013 
> system.time(for(i in 1:100)firsti.c(x,.99))
   user  system elapsed 
  0.002   0.000   0.002 
 
> system.time(for(i in 1:100)firsti(x,.9999))
   user  system elapsed 
  2.148   0.013   2.164 
> system.time(for(i in 1:100)firsti.c(x,.9999))
   user  system elapsed 
  0.393   0.002   0.396 

And in a new run (without the above tests)  with k=.999999 the index was 1089653 and the timing for the uncompiled function was 152 seconds and the timing for the compiled function was 28.8 seconds!

Berend




More information about the R-help mailing list