[R] effective way to return only the first argument of "which()"

Bert Gunter gunter.berton at gene.com
Wed Sep 19 21:08:54 CEST 2012


Excellent point! Thanks.

-- Bert

On Wed, Sep 19, 2012 at 12:00 PM, Berend Hasselman <bhh at xs4all.nl> wrote:
>
> On 19-09-2012, at 20:02, Bert Gunter wrote:
>
>> Well, following up on this observation, which can be put under the
>> heading of "Sometimes vectorization can be much slower than explicit
>> loops" , I offer the following:
>>
>> firsti  <-function(x,k)
>> {
>>  i <- 1
>>  while(x[i]<=k){i <- i+1}
>>  i
>> }
>>
>>> system.time(for(i in 1:100)which(x>.99)[1])
>>   user  system elapsed
>>   19.1     2.4    22.2
>>
>>> system.time(for(i in 1:100)which.max(x>.99))
>>   user  system elapsed
>>  30.45    6.75   37.46
>>
>>> system.time(for(i in 1:100)firsti(x,.99))
>>   user  system elapsed
>>   0.03    0.00    0.03
>>
>> ## About a 500 - 1000 fold speedup !
>>
>>> firsti(x,.99)
>> [1] 122
>>
>> It doesn't seem to scale too badly, either (whatever THAT means!):
>> (Of course, the which() versions are essentially identical in timing,
>> and so are omitted)
>>
>>> system.time(for(i in 1:100)firsti(x,.9999))
>>   user  system elapsed
>>   2.70    0.00    2.72
>>
>>> firsti(x,.9999)
>> [1] 18200
>>
>> Of course, at some point, the explicit looping is awful -- with k =
>> .999999, the index was about 360000, and the timing test took 54
>> seconds.
>>
>> So I guess the point is -- as always -- that the optimal approach
>> depends on the nature of the data. Prudence and robustness clearly
>> demands the vectorized which() approaches if you have no information.
>> But if you do know something about the data, then you can often write
>> much faster tailored solutions. Which is hardly revelatory, of course.
>
> And compiling the firsti function can also be quite lucrative!
>
> firsti <- function(x,k)
> {
>     i <- 1
>     while(x[i]<=k){i <- i+1}
>     i
> }
>
> library(compiler)
> firsti.c <- cmpfun(firsti)
>
>> firsti(x,.99)
> [1] 93
>> firsti.c(x,.99)
> [1] 93
>
>> system.time(for(i in 1:100)firsti(x,.99))
>    user  system elapsed
>   0.014   0.000   0.013
>> system.time(for(i in 1:100)firsti.c(x,.99))
>    user  system elapsed
>   0.002   0.000   0.002
>
>> system.time(for(i in 1:100)firsti(x,.9999))
>    user  system elapsed
>   2.148   0.013   2.164
>> system.time(for(i in 1:100)firsti.c(x,.9999))
>    user  system elapsed
>   0.393   0.002   0.396
>
> And in a new run (without the above tests)  with k=.999999 the index was 1089653 and the timing for the uncompiled function was 152 seconds and the timing for the compiled function was 28.8 seconds!
>
> Berend
>



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




More information about the R-help mailing list