[Bioc-devel] S4 Method Slow Execution if Signature Has Multiple Class Unions

Michael Lawrence |@wrence@m|ch@e| @end|ng |rom gene@com
Fri Dec 3 20:21:17 CET 2021


Hi Dario,

Thanks for the reproducible example. The time comes from building all
the possible signatures when dispatching the first time.  The expense
comes from the length of the signature. After the first dispatch, the
table is cached and everything is fast.

Have you considered just using an ordinary function as a constructor?
Usually, there is no need for polymorphism during construction, and
any that is necessary can be handled through explicit conditions. One
reason for this is that it is rare to desire extensibility of
construction across packages, since the constructor makes assumptions
about the internal structure of the class.

In this case, you are using dispatch to condition on argument
missingness, which I would argue is more clearly implemented just
using `if(missing())` and/or putting default values in the constructor
formals. There are very few valid use cases for generics with
signatures this long.

As an aside, the behavior (at least in this example) seems confusing.
For example, ParamSet() uses 'M' as the default for `A`, while
specifying any parameter results in `A` defaulting to 'L'.

Hope this helps,
Michael



Michael

On Wed, Nov 24, 2021 at 3:02 AM Dario Strbenac via Bioc-devel
<bioc-devel using r-project.org> wrote:
>
> Hello,
>
> Thanks. It was difficult to pinpoint, but I was able to make a minimal example. It happens only if SummarizedExperiment is pre-loaded. The difference is 0.2 seconds versus 32 seconds on my modest Windows 10 laptop computer - a 150 times slowdown. Can you reproduce it?
>
> library(SummarizedExperiment)
>
> setClassUnion("characterOrMissing", c("character", "missing"))
> setClassUnion("integerOrMissing", c("integer", "missing"))
> setClass("ParamsSet", representation(A = "characterOrMissing", B = "integer"))
> setGeneric("ParamsSet", function(A, B, C, D, E, F, G, H) standardGeneric("ParamsSet"))
>
> setMethod("ParamsSet", c("missing", "missing", "missing", "missing", "missing", "missing", "missing", "missing"),
> function() # Empty constructor
> {
>   new("ParamsSet", A = 'M', B = 300L)
> })
>
> setMethod("ParamsSet", c("characterOrMissing", "integerOrMissing", "integerOrMissing", "integerOrMissing",
>                          "characterOrMissing", "integerOrMissing", "integerOrMissing", "integerOrMissing"),
> function(A = c('L', 'M', 'N'), B = 500, C = 100, D, E, F, G, H)
> {
>   if(missing(A)) A <- 'L' # Mimick match.arg.
>   if(missing(B)) B <- 500L # Hack to implement parameter defaults not specified by generic.
>   if(missing(C)) C <- 100L
>   new("ParamsSet", A = A, B = B)
> })
>
> system.time(ParamsSet(B = 999L)) # Slow or fast, depending on SummarizedExperiment presence.
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel



--
Michael Lawrence
Principal Scientist, Director of Data Science and Statistical Computing
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla using gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube



More information about the Bioc-devel mailing list