[Bioc-devel] avoiding clashes of different S4 methods with the same generic

Michael Lawrence lawrence.michael at gene.com
Tue Apr 26 22:47:56 CEST 2016


On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun <alun at wehi.edu.au> wrote:
> Dear List,
>
> When a S4 method for the same class is defined in two separate packages
> (i.e., under the same generic), and both packages are loaded into a R
> session, it seems that the method from the package loaded later clobbers the
> method from the package loaded first. Is it possible to specifically call
> the method in the first package when both packages are loaded? If not, how
> should we protect against this?
>
> To give some context; the csaw package currently defines a normalize()
> method for SummarizedExperiment objects, using the generic from
> BiocGenerics. However, if some other hypothetical package (I'll call it
> "swings", for argument's sake) were to define a normalize() method with a SE
> signature, and if the swings package were to be loaded after csaw, then it
> seems that all calls to normalize() would use the method defined by swings,
> rather than that defined by csaw.
>
> Now, for usual functions, disambiguation would be easy with "::", but I
> don't know whether this can be done in the S4 system, given that the details
> of dispatch are generally hidden away. The only solution I can see is for
> csaw (and/or swings) to define a SE subclass; define the normalize() method
> using the subclass as the signature, such that S4 dispatch will now go to
> the correct method; and hope that no other package redefines normalize() for
> the subclass.
>
> Is this what I should be doing routinely, i.e., define subclasses and
> methods for those subclasses in all my packages? Or am I missing something
> obvious? I would have expected such clashes to be more of a problem, given
> how many new packages are being added to BioC at every release.
>

I would recommend against defining subclasses of basic data structures
that differ only in their behavior. The purpose of
SummarizedExperiment is to store data. One might use inheritance to
modify how the data are stored, or to store new types of data,
although the latter may be best addressed through composition.

To extend behavior, define methods. The generic represents the verb
and thus the semantics of the operation. In general, method conflicts
indicate that the design is broken. In this case, the normalize()
generic has a very general name. There is no one way to "normalize" a
SummarizedExperiment. It would be difficult for the reader to
understand such ambiguous code. To indicate a specific normalization
algorithm, we either need a more specific generic or we need to
parameterize it further.

One way to make more specific generics would be to give them the same
name, "normalize", but define them in different namespaces and require
:: qualification. That would mean abandoning the BiocGenerics generic
and it would only work if each package provides only one way to
normalize. Or, one could give them different names, but it would be
difficult to select a natural name, and it's not clear whether the
abstract notion of normalization should be always coupled with the
method.

A more flexible/modular approach would be to augment the signature of
BiocGenerics::normalize to indicate a normalization method and rely on
dual-dispatch:

normalize(se, WithSwings())
normalize(se, WithCSaw())

Roughly, one example of this approach is
VariantAnnotation::locateVariants() and its variant type argument.

The affy package (or something around it) auto-qualifies the generic
via a method argument; something like S3 around S4. For example
normalize(se, "swings") would call normalize.swings(se), where
normalize.swings itself could be generic. Another way to effect
cascading dispatch is through composition, where the method object
either is a function or can provide one to implement the normalization
(emulating message passing OOP), which would allow normalize() to
implemented simply as:

normalize <- function(x, method, ...) normalizer(method)(x, ...)

One issue is that the syntax is a bit unconventional and users might
end up preferring the affy approach, with a normalize_csaw() and
normalize_swings(). But I like the modular, dynamic approach outlined
above.

Thoughts?

Michael

> Cheers,
>
> Aaron
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list