[Bioc-devel] avoiding clashes of different S4 methods with the same generic

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Apr 27 14:57:19 CEST 2016


I agree with Herve that making a subclass (is that the technical term?)
might be a good idea.  This is what I would do.  And this is what has been
done historically for all the extensions of ExpressionSet and eSet from
Biobase.  The issue with a general class like SummarizedExperiment is that
methods defined on that class really ought to make sense for any instance
of the class, and since the class can contain basically any form of data, I
don't see how one can define any kind of informed data processing. For
example, SummarizedExperiment can contain any number of different assays
and what happens with all of those when you call normalize().  By making a
subclass you're can say that this is an object which contains a very
specific form of data (which can cue checked by extending the validity
check), and you can now start to think about data processing, while still
benefitting from the general infrastructure.

But I would also say that normalize() is a terrible choice for a method
since it is basically maximally uninformative about what happens.

Best,
Kasper

On Wed, Apr 27, 2016 at 7:24 AM, Michael Lawrence <lawrence.michael at gene.com
> wrote:

> On Tue, Apr 26, 2016 at 11:12 PM, Hervé Pagès <hpages at fredhutch.org>
> wrote:
> > Hi,
> >
> > I would not discard defining a SummarizedExperiment subclass so quickly.
> > SummarizedExperiment is very generic and can contain any kind of data.
> > IIUC the csaw package uses SummarizedExperiment to store a particular
> > kind of data (ChIP-seq data) and I believe specialization is a
> > legitimate situation for defining a subclass, even if the subclass is
> > a "straight" subclass i.e. a subclass that doesn't add new slots or
> > doesn't touch the existing slots.
> >
> > OTOH introducing a "straight" subclass only to define one specialized
> > method on it (the "normalize" method in this case) might not be worth
> > it since there is a cost for such class, even if that cost is minimal:
> > a cost for the user (one new container/constructor to deal with) and a
> > cost for the developer (e.g. multiplication of coerce methods).
> >
>
> If the data are more specialized, specialize the data structure, but
> the fact that the specialization solves the normalize() ambiguity is a
> mere coincidence. There are two different concerns.
>
> > Changing the signature of the normalize() generic in BiocGenerics and
> > introducing dual dispath is of course doable but that means the
> > maintainers of the packages that define methods on this generic are
> > ok with the dual dispatch game and are willing to make the required
> > modifications to their packages. It's an important change and I don't
> > see an easy way to make it happen smoothly (i.e. thru a
> > deprecated/defunct cycle).
> >
>
> In conjunction with what Martin said, you could define a
> "ANY","missing" method that emits a deprecation warning, and then
> recall the generic using NULL or something for the second argument so
> that it falls through. Packages would only need to fix the formals of
> their method definition.
>
> > Here is the list of packages that currently define methods for
> > BiocGenerics::normalize():
> >
> >   affyPLM
> >   Cardinal
> >   codelink
> >   CopyNumber450k
> >   csaw
> >   diffHic
> >   EBImage
> >   epigenomix
> >   MSnbase
> >   oligo
> >   qpcrNorm
> >   scran
> >
> > [Interestingly the scran package defines a default "normalize" method
> > (i.e. a normalize,ANY method)].
> >
> > Whether we make the second argument lightweight or parameterized (which
> > is something that would need to be decided at the level of the generic)
> > these packages will break as soon as we change the signature of the
> > generic. So we'll need to wait after the release before this happens.
> >
> > Personally I find the lightweight second argument not particularly
> > intuitive, elegant, or user-friendly. I'd rather type
> > normalizeSwing(se, ...) or normalize(se, SwingParam(...)) than
> > normalize(se, WithSwing(), ...).
> >
>
> Sure, WithSwing() could hold arguments as well, but I agree that the
> Param suffix is more consistent. The Param naming is not great for
> autocompletion. Though I guess the interface could provide hints based
> on the defined methods.
>
> > Last thing: In case of a parameterized second argument, do we really
> > need a virtual normalizeParam class as parent of all the concrete
> > normalizeParam* classes? If so then I guess we would need to have it
> > defined in BiocGenerics but I think we should try hard to not start
> > defining classes in this package (that could take us too far...)
> >
>
> I would say no, no real need for a base class.
>
> > H.
> >
> >
> > On 04/26/2016 03:03 PM, Aaron Lun wrote:
> >>
> >> Yes, but "monkeyBars" doesn't have quite the same pithiness for a
> >> package name.
> >>
> >> Anyway, the dual dispatch mechanism sounds most interesting. I assume
> >> that means we'd have to define some sort of base "normalizeParam" class,
> >> and then derive "csawNormParam" and "swingsNormParam" subclasses, so
> >> that specific methods can be defined for each signature.
> >>
> >> - Aaron
> >>
> >> Martin Morgan wrote:
> >>>
> >>>
> >>> On 04/26/2016 05:28 PM, Michael Lawrence wrote:
> >>>>
> >>>> On Tue, Apr 26, 2016 at 2:16 PM, Martin Morgan
> >>>> <martin.morgan at roswellpark.org>  wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 04/26/2016 04:47 PM, Michael Lawrence wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, Apr 26, 2016 at 11:00 AM, Aaron Lun<alun at wehi.edu.au>
> >>>
> >>> wrote:
> >>> ...
> >>>>>>>>>>
> >>>>>>>>>> BiocGenerics. However, if some other hypothetical package
> >>>
> >>> (I'll call it
> >>>>>>>>>>
> >>>>>>>>>> "swings", for argument's sake) were to define a normalize()
> >>>
> >>> method with a
> >>> ...
> >>>>>>
> >>>>>> I like the dual dispatch method quite a bit (but wonder why we get
> >>>
> >>> several
> >>>>>>
> >>>>>> swings but only one csaw? Maybe a csaw implies two participants
> >>>>>
> >>>>> [though I
> >>>>>>
> >>>>>> think I once in a while csaw-ed alone], so a singular csaw and a
> >>>>>
> >>>>> pair of
> >>>>>>
> >>>>>> swings balance out?), partly because it's very easy to extend
> >>>>>
> >>>>> (write another
> >>>>>>
> >>>>>> method) and the second argument can be either lightweight or
> >>>>>
> >>>>> parameterized.
> >>>>>>
> >>>>>>
> >>>> I could go along with the dual dispatch. "Swings" is short for "Set of
> >>>> swings". Usually, there are several swings in a row, but only one
> >>>> see-saw.
> >>>>
> >>>
> >>> Googling for "how many swings per see-saw" took me to
> >>>
> >>>    https://www.cpsc.gov//PageFiles/108601/playgrnd.pdf
> >>>
> >>> where it is apparent that swings are much more dangerous than see-saws
> >>> (e.g., 51 matches for "swing" versus 4 for "see-saw"; "Swings ... were
> >>> involved in about 19 ... percent of injuries ... See-saws accounted
> >>> for about three percent"; "Homemade rope, tire, or tree swings were
> >>> also involved in a number of hanging deaths" [no mention of death by
> >>> see-saw]).
> >>>
> >>> I think for the sake of our users, especially our younger users, we do
> >>> not want to consider swings, or even methods on swings, further.
> >>>
> >>> Martin
> >>>
> >>>
> >>> This email message may contain legally privileged and/or confidential
> >>> information.  If you are not the intended recipient(s), or the
> >>> employee or agent responsible for the delivery of this message to the
> >>> intended recipient(s), you are hereby notified that any disclosure,
> >>> copying, distribution, or use of this email message is prohibited.  If
> >>> you have received this message in error, please notify the sender
> >>> immediately by e-mail and delete this email message from your
> >>> computer. Thank you.
> >>
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fredhutch.org
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
> >
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list