[R] effective way to return only the first argument of "which()"
Berend Hasselman
bhh at xs4all.nl
Wed Sep 19 21:00:47 CEST 2012
On 19-09-2012, at 20:02, Bert Gunter wrote:
> Well, following up on this observation, which can be put under the
> heading of "Sometimes vectorization can be much slower than explicit
> loops" , I offer the following:
>
> firsti <-function(x,k)
> {
> i <- 1
> while(x[i]<=k){i <- i+1}
> i
> }
>
>> system.time(for(i in 1:100)which(x>.99)[1])
> user system elapsed
> 19.1 2.4 22.2
>
>> system.time(for(i in 1:100)which.max(x>.99))
> user system elapsed
> 30.45 6.75 37.46
>
>> system.time(for(i in 1:100)firsti(x,.99))
> user system elapsed
> 0.03 0.00 0.03
>
> ## About a 500 - 1000 fold speedup !
>
>> firsti(x,.99)
> [1] 122
>
> It doesn't seem to scale too badly, either (whatever THAT means!):
> (Of course, the which() versions are essentially identical in timing,
> and so are omitted)
>
>> system.time(for(i in 1:100)firsti(x,.9999))
> user system elapsed
> 2.70 0.00 2.72
>
>> firsti(x,.9999)
> [1] 18200
>
> Of course, at some point, the explicit looping is awful -- with k =
> .999999, the index was about 360000, and the timing test took 54
> seconds.
>
> So I guess the point is -- as always -- that the optimal approach
> depends on the nature of the data. Prudence and robustness clearly
> demands the vectorized which() approaches if you have no information.
> But if you do know something about the data, then you can often write
> much faster tailored solutions. Which is hardly revelatory, of course.
And compiling the firsti function can also be quite lucrative!
firsti <- function(x,k)
{
i <- 1
while(x[i]<=k){i <- i+1}
i
}
library(compiler)
firsti.c <- cmpfun(firsti)
> firsti(x,.99)
[1] 93
> firsti.c(x,.99)
[1] 93
> system.time(for(i in 1:100)firsti(x,.99))
user system elapsed
0.014 0.000 0.013
> system.time(for(i in 1:100)firsti.c(x,.99))
user system elapsed
0.002 0.000 0.002
> system.time(for(i in 1:100)firsti(x,.9999))
user system elapsed
2.148 0.013 2.164
> system.time(for(i in 1:100)firsti.c(x,.9999))
user system elapsed
0.393 0.002 0.396
And in a new run (without the above tests) with k=.999999 the index was 1089653 and the timing for the uncompiled function was 152 seconds and the timing for the compiled function was 28.8 seconds!
Berend
More information about the R-help
mailing list