[Rd] speedbump in library

Mon Jan 26 12:36:27 CET 2015

>>>>> Winston Chang <winstonchang1 at gmail.com>
>>>>>     on Fri, 23 Jan 2015 10:15:53 -0600 writes:

    > I think you can simplify a little by replacing this:

    > 	pkg %in% loadedNamespaces()
    > with this:
    >   .getNamespace(pkg)

almost:  It would be 

      !is.null(.getNamespace(pkg))

    > Whereas getNamespace(pkg) will load the package if it's not already
    > loaded, calling .getNamespace(pkg) (note the leading dot) won't load
    > the package.

indeed.  
And you, Winston, are right that this new code snippet would be
an order of magnitude faster :

##-----------------------------------------------------------------------------

f1 <- function(pkg) pkg %in% loadedNamespaces()
f2 <- function(pkg) !is.null(.getNamespace(pkg))

require(microbenchmark)

pkg <- "foo"; (mbM <- microbenchmark(r1 <- f1(pkg), r2 <- f2(pkg))); stopifnot(identical(r1,r2)); r1
## Unit: microseconds
##           expr    min      lq     mean  median      uq    max neval cld
##  r1 <- f1(pkg) 38.516 40.9790 42.35037 41.7245 42.4060 82.922   100   b
##  r2 <- f2(pkg)  1.331  1.8285  2.13874  2.0855  2.3365  7.252   100  a
## [1] FALSE

pkg <- "stats"; (mbM <- microbenchmark(r1 <- f1(pkg), r2 <- f2(pkg))); stopifnot(identical(r1,r2)); r1
## Unit: microseconds
##           expr    min      lq     mean  median      uq    max neval cld
##  r1 <- f1(pkg) 29.955 31.2575 32.27748 31.6035 32.1215 62.428   100   b
##  r2 <- f2(pkg)  1.067  1.4315  1.71437  1.6335  1.8460  9.169   100  a
## [1] TRUE
loadNamespace("Matrix")
## <environment: namespace:Matrix>
pkg <- "Matrix"; (mbM <- microbenchmark(r1 <- f1(pkg), r2 <- f2(pkg))); stopifnot(identical(r1,r2)); r1
## Unit: microseconds
##           expr    min      lq     mean  median      uq    max neval cld
##  r1 <- f1(pkg) 32.721 33.5205 35.17450 33.9505 34.6050 65.373   100   b
##  r2 <- f2(pkg)  1.010  1.3750  1.93671  1.5615  1.7795 12.128   100  a
## [1] TRUE

##-----------------------------------------------------------------------------

Hence, indeed,
       		!is.null(.getNamespace(pkg))

seems equivalent to
      		 pkg %in% loadedNamespaces()

--- when 'pkg' is of length 1  (!!!)

but is 20 times faster....  and we have
11  occurrences  of   ' <...> %in%  loadedNamespaces() '
in the "base packages" in the R (devel) sources,
 3 in base,  2 in methods,  3 in stats, 2 in tools, 1 in utils..

On the other hand,     
       	     	 pkg %in% loadedNamespaces()

is extremely nicely readable code, whereas
		!is.null(.getNamespace(pkg))
is pretty much the contrary.
.. and well readable code is so much easier to maintain etc,
such that in many cases, code optimization with the cost of
code obfuscation is *not* desirable.

Of course we could yet again use a few lines of C and R code to
provide a new R lowlevel function, say

	      is.loadedNamespace()

which would be even faster than   !is.null(.getNamespace(pkg)) 

...
...

but do we have *any* evidence that this would noticably speedup
any higher level function such as library() ?

Thank you, again, Winston; you've opened an interesting topic!

--
Martin Maechler, ETH Zurich