[Rd] Style question

Fri May 30 23:40:05 CEST 2014

Using `::` does add some overhead - on the order of 5-10 microseconds
on my computer. Still, it would take 100,000 calls to add 0.5-1 second
of delay.

microbenchmark(
  base::identity(1),
  identity(1),
  unit = "us"
)
# Unit: microseconds
#               expr   min     lq median     uq    max neval
#  base::identity(1) 5.677 6.2180 6.6695 7.3655 60.104   100
#        identity(1) 0.262 0.2965 0.3210 0.4035  1.034   100

This test isn't exactly like putting identity in imports, since in
this case, the number environments to search is greater -- but it's
reasonably close.

If you're in a situation where you want to be explicit about where a
function came from, but the slowness of `::` is an issue, you could
create a variable that points to the environment and access the
function using $:

base <- as.environment('package:base')
microbenchmark(
  base::identity(1),
  base$identity(1),
  identity(1),
  unit = "us"
)
# Unit: microseconds
#               expr   min     lq median     uq    max neval
#  base::identity(1) 5.520 6.0795 6.4485 7.0020 32.232   100
#   base$identity(1) 0.504 0.5940 0.6635 0.8105  7.701   100
#        identity(1) 0.248 0.2815 0.3100 0.3885  7.925   100

-Winston

On Fri, May 30, 2014 at 2:53 PM, Hervé Pagès <hpages at fhcrc.org> wrote:
> Hi Gabe,
>
>
> On 05/30/2014 11:34 AM, Gabriel Becker wrote:
>>
>> This isn't likely to make much difference in most cases, but calling a
>> function via :: can incur up to about twice the overhead on average
>> compared to calling an imported function
>>
>>  > fun1
>> function ()
>> file_ext("text.txt")
>> <environment: namespace:imptest>
>>  > fun2
>> function ()
>> tools::file_ext("text.txt")
>> <environment: namespace:imptest>
>>  > microbenchmark(fun1(), times=10000)
>> Unit: microseconds
>>     expr    min     lq median      uq     max neval
>>   fun1() 24.506 25.654 26.324 27.8795 154.001 10000
>>  > microbenchmark(fun2(), times=10000)
>> Unit: microseconds
>>     expr    min      lq  median      uq     max neval
>>   fun2() 42.723 46.6945 48.8685 52.0595 2021.91 10000
>
>
> Interesting. Or with a void function so the timing more closely
> reflects the time it takes to look up the symbol:
>
>   > void
>   function ()
>   NULL
>   <environment: namespace:S4Vectors>
>
>   > fun1
>   function ()
>   void()
>   <environment: namespace:IRanges>
>
>   > fun2
>   function ()
>   S4Vectors::void()
>   <environment: namespace:IRanges>
>
>   > microbenchmark(fun1(), times=10000)
>   Unit: nanoseconds
>
>      expr min  lq median  uq   max neval
>    fun1() 261 268    270 301 11960 10000
>
>   > microbenchmark(fun2(), times=10000)
>   Unit: microseconds
>      expr    min     lq median     uq      max neval
>    fun2() 13.486 14.918 15.782 16.753 60542.19 10000
>
> S4Vectors::void() is about 60x slower than void()!
>
> Cheers,
> H.
>
>>
>> Also, if one uses roxygen2 (or even if one doesn't) ##'@importFrom above
>> the function doing the calling documents this.
>>
>> And of course if you need to know where a function lives environment
>> will tell you.
>>
>> ~G
>>
>>
>> On Fri, May 30, 2014 at 10:00 AM, Hadley Wickham <h.wickham at gmail.com
>> <mailto:h.wickham at gmail.com>> wrote:
>>
>>      > There is at least one subtle consequence to keep in mind when doing
>>      > this. Of course, whatever choice you make, if the whatever()
>> function
>>      > moves to a different package, this breaks your package.
>>      > However, if you explicitly import the function, your package will
>>      > break at load-time (which is good) and you'll only have to modify
>>      > 1 line in the NAMESPACE file to fix it. But if you do
>>     foo::whatever(),
>>      > your package won't break at load-time, only at run-time. Also
>> you'll
>>      > have to edit all the calls to foo::whatever() to fix the package.
>>      >
>>      > Probably not a big deal, but in an environment like Bioconductor
>>     where
>>      > infrastructure classes and functions can be shared by hundreds of
>>      > packages, having people use foo::whatever() in a systematic way
>> would
>>      > probably make maintenance a little bit more painful than it needs
>> to
>>      > be when the need arises to reorganize/refactor parts of the
>>      > infrastructure. Also, the ability to quickly grep the NAMESPACE
>>      > files of all BioC packages to see who imports what is very
>> convenient
>>      > in this situation.
>>
>>     OTOH, I think there's a big benefit to being able to read package code
>>     and instantly know where a function comes from.
>>
>>     Personally, I found this outweighs the benefits that you outline:
>>
>>     * functions rarely move between packages, and gsubbing for pkga:foo to
>>     pkgb:foo isn't hard
>>     * it's not that much hard to grep for pkg::foo in R/* than it is to
>>     grep NAMESPACE
>>
>>     Hadley
>>
>>     --
>>     http://had.co.nz/
>>
>>     ______________________________________________
>>     R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
>>
>>     https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>>
>> --
>> Gabriel Becker
>> Graduate Student
>> Statistics Department
>> University of California, Davis
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel