[Rd] cache most-recent dispatch
John Chambers
jmc at r-project.org
Tue Jul 2 17:41:51 CEST 2013
It's hard to see how repeated dispatch on the same classes can be that
slow, _if_ the function being called each time is itself doing some
substantial work.
The first call (in a session) with a particular signature searches for
inherited methods and stores the method found in a table. The following
calls with that signature should do a single lookup in a hash table.
Caching the last signature is unlikely to be dramatically faster, but we
can experiment and see.
What is substantially different is calling a generic function vs calling
a primitive or internal. If the local paste you constructed is the
default, base::paste, that is a .Internal.
Not going through the R generic function several thousand times would
make a difference.
It's a fundamental point about R that function calls do enough work that
they add significant time to a "trivial" computation, such as a
primitive call. There are various efforts going on these days to
provide more efficient alternatives. They're all helpful; my personal
favorite when the game is worth it is to consider doing key computations
in a seriously faster language, like C++ via Rcpp.
John
On 7/1/13 10:04 PM, Valerie Obenchain wrote:
> Hi,
>
> S4 method dispatch can be very slow. Would it be reasonable to cache the
> most
> recent dispatch, anticipating the next invocation will be on the same
> type? This
> would be very helpful in loops.
>
> fun0 <- function(x)
> sapply(x, paste, collapse="+")
> fun1 <- function(x) {
> paste <- selectMethod(paste, class(x[[1]]))
> sapply(x, paste, collapse="+")
> }
> lst <- split(rep(LETTERS, 100), rep(1:1300, 2))
>
> library(microbenchmark)
> microbenchmark(fun0(lst), times=10)
> ## Unit: milliseconds
> ## expr min lq median uq max neval
> ## fun0(lst) 4.153287 4.180659 4.513539 5.19261 5.280481 10
>
> setGeneric("paste")
> microbenchmark(fun0(lst), fun1(lst), times=10)
> ## > microbenchmark(fun0(lst), fun1(lst), times=10)
> ## Unit: milliseconds
> ## expr min lq median uq max neval
> ## fun0(lst) 21.093180 21.27616 21.453174 21.833686 24.758791 10
> ## fun1(lst) 4.517808 4.53067 4.582641 4.682235 5.121856 10
>
> Dispatch seems to be especially slow when packages are involved, e.g.,
> with the Bioconductor IRanges package
> (http://bioconductor.org/packages/release/bioc/html/IRanges.html)
>
> removeGeneric("paste")
> library(IRanges)
> showMethods(paste)
> ## Function: paste (package BiocGenerics)
> ## ...="ANY"
> ## ...="Rle"
> selectMethod(paste, "ANY")
> ## Method Definition (Class "derivedDefaultMethod"):
> ##
> ## function (..., sep = " ", collapse = NULL)
> ## .Internal(paste(list(...), sep, collapse))
> ## <environment: namespace:base>
> ##
> ## Signatures:
> ## ...
> ## target "ANY"
> ## defined "ANY"
>
> microbenchmark(fun0(lst), fun1(lst), times=10)
> ## Unit: milliseconds
> ## expr min lq median uq max
> neval
> ## fun0(lst) 233.539585 234.592491 236.311209 237.268506 243.181123
> 10
> ## fun1(lst) 4.564914 4.592996 4.642898 4.729009 5.492706
> 10
>
> sessionInfo()
> ## R version 3.0.0 Patched (2013-04-04 r62492)
> ## Platform: x86_64-unknown-linux-gnu (64-bit)
> ##
> ## locale:
> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> ## [7] LC_PAPER=C LC_NAME=C
> ## [9] LC_ADDRESS=C LC_TELEPHONE=C
> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> ##
> ## attached base packages:
> ## [1] parallel stats graphics grDevices utils datasets
> methods
> ## [8] base
> ##
> ## other attached packages:
> ## [1] IRanges_1.19.15 BiocGenerics_0.7.2 microbenchmark_1.3-0
> ##
> ## loaded via a namespace (and not attached):
> ## [1] stats4_3.0.0
>
>
> Thanks,
> Valerie
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list