[R] dist like function but where you can configure the method

Bert Gunter gunter.berton at gene.com
Fri May 16 23:13:26 CEST 2014


If the apply() call is not empty, its contents must of course be
interpreted. That's where the time goes.

>system.time(for(i in 1:1e6)rnorm(1))
   user  system elapsed
   5.25    0.00    5.29

> system.time(lapply(1:1e6,rnorm,n=1))
   user  system elapsed
   9.64    0.01    9.72

> system.time(vapply(1:1e6,rnorm,FUN.VALUE=0,n=1))
   user  system elapsed
   5.69    0.00    5.73


I rest my case.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Fri, May 16, 2014 at 1:00 PM, Witold E Wolski <wewolski at gmail.com> wrote:
> Ouch,
>
> First : my question was not how to implement dist but if there is a
> more generic dist function than stats:dist.
>
> Secondly: ks.test is ment as a placeholder (see the comment in the
> code I did send) for any other function taking two vector arguments.
>
> Third: I do subscribe to the idea that a function call is easier to
> read and understand than a for loop. @Bert apply is a native C
> function and the loop is not interpreted AFAIK
>
> @Rui @Barry @Jari What do you benchmark? an empty loop?
>
> Look at the trivial benchmarks below: _apply_ clearly outperforms a
> for loop in R , It always has, it outperforms even an empty for
>
> # an empty unrealistic for loop as suggested by Rui , Barry and Jari
> f1 <- function(n){
>   for(i in 1:n){
>     for(j in 1:n){
>     }
>   }}
>
>
> myfunc = function(x,y=x){x-y}
>
> # a for loop which does actually something
> f2 <- function(n){
>   mm <- matrix(0,ncol=n,nrow=n)
>   for(i in 1:n){
>     for(j in 1:n){
>       mm[i,j] = myfunc(i,j)
>     }
>   }
>   return(mm)
> }
>
> # and array
> f3 = function(n){
>   res = rep(0,n*n)
>   for(i in 1:(n*n))
>   {
>     res[i] = myfunc(i)
>   }
> }
>
>
> n = 1000
> system.time(f1(n))
> system.time(f2(n))
> system.time(f3(n))
> system.time(apply(t(1:(n*n)),1,myfunc))
>
>
>> system.time(f1(n))
>        User      System verstrichen
>        0.28        0.00        0.28
>> system.time(f2(n))
>        User      System verstrichen
>        6.80        0.00        7.09
>> system.time(f3(n))
>        User      System verstrichen
>        5.83        0.00        5.98
>> system.time(apply(t(1:(n*n)),1,myfunc))
>        User      System verstrichen
>        0.19        0.00        0.19
>
>
>
>
>
>
> On 16 May 2014 20:55, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>> Hello,
>>
>> The compiler package is good at speeding up for loops but in this case the
>> gain is neglectable. The ks test is the real time problem.
>>
>> library(compiler)
>>
>> f1 <- function(n){
>>
>>         for(i in 1:100){
>>                 for(i in 1:100){
>>                         ks.test(runif(100),runif(100))
>>                 }
>>         }
>> }
>>
>> f1.c <- cmpfun(f1)
>>
>> system.time(f1())
>>    user  system elapsed
>>    3.50    0.00    3.53
>> system.time(f1.c())
>>    user  system elapsed
>>    3.47    0.00    3.48
>>
>>
>> Rui Barradas
>>
>> Em 16-05-2014 17:12, Barry Rowlingson escreveu:
>>>
>>> On Fri, May 16, 2014 at 4:46 PM, Witold E Wolski <wewolski at gmail.com>
>>> wrote:
>>>>
>>>> Dear Jari,
>>>>
>>>> Thanks for your reply...
>>>>
>>>> The overhead would be
>>>> 2 for loops
>>>> for(i in 1:dim(x)[2])
>>>> for(j in i:dim(x)[2])
>>>>
>>>> isn't it? Or are you seeing a different way to implement it?
>>>>
>>>> A for loop is pretty expensive in R. Therefore I am looking for an
>>>> implementation similar to apply or lapply were the iteration is made
>>>> in native code.
>>>
>>>
>>> No, a for loop is not pretty expensive in R -- at least not compared
>>> to doing a k-s test:
>>>
>>>   > system.time(for(i in 1:10000){ks.test(runif(100),runif(100))})
>>>     user  system elapsed
>>>    3.680   0.012   3.697
>>>
>>>   3.68 seconds to do 10000 ks tests (and generate 200 runifs)
>>>
>>>   > system.time(for(i in 1:10000){})
>>>     user  system elapsed
>>>    0.000   0.000   0.001
>>>
>>>   0.000s time to do 10000 loops. Oh lets nest it for fun:
>>>
>>>   > system.time(for(i in 1:100){for(i in
>>> 1:100){ks.test(runif(100),runif(100))}})
>>>     user  system elapsed
>>>    3.692   0.004   3.701
>>>
>>>   no different. Even a ks-test with only 5 items is taking me 2.2 seconds.
>>>
>>> Moral: don't worry about the for loops.
>>>
>>> Barry
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
>
> --
> Witold Eryk Wolski
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list