[R] effective way to return only the first argument of "which()"
Bert Gunter
gunter.berton at gene.com
Wed Sep 19 21:08:54 CEST 2012
Excellent point! Thanks.
-- Bert
On Wed, Sep 19, 2012 at 12:00 PM, Berend Hasselman <bhh at xs4all.nl> wrote:
>
> On 19-09-2012, at 20:02, Bert Gunter wrote:
>
>> Well, following up on this observation, which can be put under the
>> heading of "Sometimes vectorization can be much slower than explicit
>> loops" , I offer the following:
>>
>> firsti <-function(x,k)
>> {
>> i <- 1
>> while(x[i]<=k){i <- i+1}
>> i
>> }
>>
>>> system.time(for(i in 1:100)which(x>.99)[1])
>> user system elapsed
>> 19.1 2.4 22.2
>>
>>> system.time(for(i in 1:100)which.max(x>.99))
>> user system elapsed
>> 30.45 6.75 37.46
>>
>>> system.time(for(i in 1:100)firsti(x,.99))
>> user system elapsed
>> 0.03 0.00 0.03
>>
>> ## About a 500 - 1000 fold speedup !
>>
>>> firsti(x,.99)
>> [1] 122
>>
>> It doesn't seem to scale too badly, either (whatever THAT means!):
>> (Of course, the which() versions are essentially identical in timing,
>> and so are omitted)
>>
>>> system.time(for(i in 1:100)firsti(x,.9999))
>> user system elapsed
>> 2.70 0.00 2.72
>>
>>> firsti(x,.9999)
>> [1] 18200
>>
>> Of course, at some point, the explicit looping is awful -- with k =
>> .999999, the index was about 360000, and the timing test took 54
>> seconds.
>>
>> So I guess the point is -- as always -- that the optimal approach
>> depends on the nature of the data. Prudence and robustness clearly
>> demands the vectorized which() approaches if you have no information.
>> But if you do know something about the data, then you can often write
>> much faster tailored solutions. Which is hardly revelatory, of course.
>
> And compiling the firsti function can also be quite lucrative!
>
> firsti <- function(x,k)
> {
> i <- 1
> while(x[i]<=k){i <- i+1}
> i
> }
>
> library(compiler)
> firsti.c <- cmpfun(firsti)
>
>> firsti(x,.99)
> [1] 93
>> firsti.c(x,.99)
> [1] 93
>
>> system.time(for(i in 1:100)firsti(x,.99))
> user system elapsed
> 0.014 0.000 0.013
>> system.time(for(i in 1:100)firsti.c(x,.99))
> user system elapsed
> 0.002 0.000 0.002
>
>> system.time(for(i in 1:100)firsti(x,.9999))
> user system elapsed
> 2.148 0.013 2.164
>> system.time(for(i in 1:100)firsti.c(x,.9999))
> user system elapsed
> 0.393 0.002 0.396
>
> And in a new run (without the above tests) with k=.999999 the index was 1089653 and the timing for the uncompiled function was 152 seconds and the timing for the compiled function was 28.8 seconds!
>
> Berend
>
--
Bert Gunter
Genentech Nonclinical Biostatistics
Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
More information about the R-help
mailing list