[Rd] cache most-recent dispatch

Valerie Obenchain vobencha at fhcrc.org
Wed Jul 3 05:40:51 CEST 2013


Thanks for the background and suggestions.

Valerie


On 07/02/2013 08:41 AM, John Chambers wrote:
> It's hard to see how repeated dispatch on the same classes can be that
> slow, _if_ the function being called each time is itself doing some
> substantial work.
>
> The first call (in a session) with a particular signature searches for
> inherited methods and stores the method found in a table.  The following
> calls with that signature should do a single lookup in a hash table.
> Caching the last signature is unlikely to be dramatically faster, but we
> can experiment and see.
>
> What is substantially different is calling a generic function vs calling
> a primitive or internal.  If the local paste you constructed is the
> default, base::paste, that is a .Internal.
>
> Not going through the R generic function several thousand times would
> make a difference.
>
> It's a fundamental point about R that function calls do enough work that
> they add significant time to a "trivial" computation, such as a
> primitive call.  There are various efforts going on these days to
> provide more efficient alternatives.  They're all helpful; my personal
> favorite when the game is worth it is to consider doing key computations
> in a seriously faster language, like C++ via Rcpp.
>
> John
>
> On 7/1/13 10:04 PM, Valerie Obenchain wrote:
>> Hi,
>>
>> S4 method dispatch can be very slow. Would it be reasonable to cache the
>> most
>> recent dispatch, anticipating the next invocation will be on the same
>> type? This
>> would be very helpful in loops.
>>
>>    fun0 <- function(x)
>>        sapply(x, paste, collapse="+")
>>    fun1 <- function(x) {
>>        paste <- selectMethod(paste, class(x[[1]]))
>>        sapply(x, paste, collapse="+")
>>    }
>>    lst <- split(rep(LETTERS, 100), rep(1:1300, 2))
>>
>>    library(microbenchmark)
>>    microbenchmark(fun0(lst), times=10)
>>    ## Unit: milliseconds
>>    ##       expr      min       lq   median      uq      max neval
>>    ##  fun0(lst) 4.153287 4.180659 4.513539 5.19261 5.280481    10
>>
>>    setGeneric("paste")
>>    microbenchmark(fun0(lst), fun1(lst), times=10)
>>    ## >     microbenchmark(fun0(lst), fun1(lst), times=10)
>>    ## Unit: milliseconds
>>    ##       expr       min       lq    median        uq       max neval
>>    ##  fun0(lst) 21.093180 21.27616 21.453174 21.833686 24.758791    10
>>    ##  fun1(lst)  4.517808  4.53067  4.582641  4.682235  5.121856    10
>>
>> Dispatch seems to be especially slow when packages are involved, e.g.,
>> with the Bioconductor IRanges package
>> (http://bioconductor.org/packages/release/bioc/html/IRanges.html)
>>
>>    removeGeneric("paste")
>>    library(IRanges)
>>    showMethods(paste)
>>    ## Function: paste (package BiocGenerics)
>>    ## ...="ANY"
>>    ## ...="Rle"
>>    selectMethod(paste, "ANY")
>>    ## Method Definition (Class "derivedDefaultMethod"):
>>    ##
>>    ## function (..., sep = " ", collapse = NULL)
>>    ## .Internal(paste(list(...), sep, collapse))
>>    ## <environment: namespace:base>
>>    ##
>>    ## Signatures:
>>    ##         ...
>>    ## target  "ANY"
>>    ## defined "ANY"
>>
>>    microbenchmark(fun0(lst), fun1(lst), times=10)
>>    ## Unit: milliseconds
>>    ##       expr        min         lq     median         uq        max
>> neval
>>    ##  fun0(lst) 233.539585 234.592491 236.311209 237.268506 243.181123
>>     10
>>    ##  fun1(lst)   4.564914   4.592996   4.642898   4.729009   5.492706
>>     10
>>
>>    sessionInfo()
>>    ## R version 3.0.0 Patched (2013-04-04 r62492)
>>    ## Platform: x86_64-unknown-linux-gnu (64-bit)
>>    ##
>>    ## locale:
>>    ##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>    ##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>    ##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>    ##  [7] LC_PAPER=C                 LC_NAME=C
>>    ##  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>    ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>    ##
>>    ## attached base packages:
>>    ## [1] parallel  stats     graphics  grDevices utils     datasets
>> methods
>>    ## [8] base
>>    ##
>>    ## other attached packages:
>>    ## [1] IRanges_1.19.15      BiocGenerics_0.7.2   microbenchmark_1.3-0
>>    ##
>>    ## loaded via a namespace (and not attached):
>>    ## [1] stats4_3.0.0
>>
>>
>> Thanks,
>> Valerie
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list