[Bioc-devel] Pushing towards a better home for matrix generics

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue Jan 29 18:43:25 CET 2019


>>>>> Pages, Herve 
>>>>>     on Tue, 29 Jan 2019 16:44:47 +0000 writes:

    > Hi Martin.  Speed is not the concern: I just did some
    > quick benchmarking and didn't observe any significant
    > difference in method dispatch performance after doing
    > setGeneric("toto", function(x, a=0, b=0, c=0)
    > standardGeneric("toto")) vs doing setGeneric("toto",
    > signature="x", function(x, a=0, b=0, c=0)
    > standardGeneric("toto")).

    > Here is the real concern to me:

    > Aliases of the form
    > \alias{colSums,dgCMatrix,ANY,ANY-method} are a real pain
    > to maintain. It's also a pain for the end user to have to
    > do ?`colSums,dgCMatrix,ANY,ANY-method` to access the man
    > page for a particular method. I know Matrix uses "short"
    > aliases i.e. aliases of the form
    > \alias{colSums,dgCMatrix-method} so the user only needs to
    > do ?`colSums,dgCMatrix-method` but there is a lot of
    > fragility to the situation.

    > Here is why: The exact form that needs to be used for
    > these aliases can change anytime depending on what happens
    > in one of the upstream packages (not a concern for the
    > Matrix package because no upstream package defines colSums
    > methods). More precisely: If all the colSums methods
    > defined in the upstream packages and in my own package are
    > defined with setMethod statements of the form:

    >     setMethod("colSums", signature(x="SomeClass"), ...)

    > then the aliases in the man pages must be of the form
    > \alias{colSums,SomeClass-method} and the end user can just
    > do ?`colSums,SomeClass-method`, which is good. But if
    > **one** upstream package decides to use a setMethod
    > statement of the form:

    >    setMethod("colSums", signature(x="SomeClass",
    > na.rm="ANY", dims="ANY"), ...)

    > then the aliases for **all** the colSums methods in
    > **all** the downstream packages now must be of the form
    > \alias{colSums,SomeOtherClass,ANY,ANY-method}, even if the
    > method for SomeOtherClass is defined with
    > signature(x="SomeOtherClass"). 

Hmm... but to me, the behavior you describe in the above paragraph
seems rather an implementation "infelicity" in R's help / documentation system,
than an intrinsic necessity.  Or have you thought more about
this and discussed it with other S4 experts (John Chambers,
Michael L., Martin Morgan, ...)  and came to a different conclusion?

Very generally:

  Just because the documentation (help system)
  rules are implemented as they are should *NOT* influence "the
  best way" to program things in R. 

and particularly for something such as S4 which has been adapted
and tuned for a long time ...

So, rather the documentation "setup" should adapt to what seems
best from an R coding point of view.

More specifically, if we are allowed to use short signatures in R code, i.e.,
      signature(x=<someClass>)
short for
      signature(x=<someClass>, na.m="ANY", dims="ANY")

then the documentation \alias{} should allow to use the same
principle, as the documentation / help "keys" which \alias{.} constructs
will be similarly uniquely determined
(at least as long as other packages do not describe methods for
 "my" <someClass>)

So, the help / documentation (and "R CMD check" checks) should
have been changed long ago, if you had sent patches to do so,
n'est-ce pas?   :-) ;-)  [[yes, half jokingly]].

Martin

    > Also, as a consequence, now
    > the end user has to use the long syntax to access the man
    > pages for these methods. And if later the author of the
    > upstream package decides to switch back to the
    > setMethod("colSums", signature(x="SomeClass"), ...) form,
    > then I have to switch back all the aliases in all my
    > downstream packages to the short form again!

    > This fragility of the alias syntax was one of the
    > motivations for me to put many setGeneric statements of
    > the form setGeneric("someGeneric", signature="x") in
    > BiocGenerics over the years. So I don't have many dozens
    > of aliases that suddenly break for mysterious reasons ('R
    > CMD check' would suddenly starts reporting warnings for
    > these aliases despite no changes in my package or in R).

    > Best,

    > H.

    > On 1/29/19 03:16, Martin Maechler wrote:
    >>>>>>> Michael Lawrence on Mon, 28 Jan 2019 20:47:58 -0800
    >>>>>>> writes:
    >> > That will have some consequences; for example, >
    >> documentation aliases will need to change. Not sure how >
    >> many packages will need to be fixed outside of Matrix,
    >> but > it's not an isolated change. Martin might comment
    >> on the > rationale for the full signature, since he
    >> defined those > generics.
    >> 
    >> > On Mon, Jan 28, 2019 at 7:21 PM Pages, Herve >
    >> <hpages using fredhutch.org> wrote:
    >> >>
    >> >> Actually there is a 4th solution which is to modify
    >> the >> definition of the implicit generics in the methods
    >> >> package (file makeBasicFunsList.R) to make them
    >> dispatch >> on their 1st arg only. Should be easy. Then
    >> no package >> will need to use a setGeneric statement >>
    >> anymore. Everybody will automatically get a clean >>
    >> implicit generic. Including the Matrix package which >>
    >> shouldn't need to be touched (except maybe for some >>
    >> aliases in its man page that might need to be changed >>
    >> from \alias{colSums,dgCMatrix,ANY,ANY-method} to >>
    >> \alias{colSums,dgCMatrix-method}).
    >> >>
    >> >> Anybody wants to try to make a patch for this?
    >> 
    >> >> H.
    >> 
    >> I've already replied without having read the above two
    >> messages.  In my reply I had indeed more or less argued
    >> as Hervé does above.
    >> 
    >> Michael, Hervé, .. : Why is it really so much better to
    >> disallow dispatch for the other compulsory arguments?
    >> Dispatch there allows to use methods for class "missing"
    >> which is nicer in my eyes than the traditional default
    >> argument + missing() "tricks".
    >> 
    >> Is it mainly speed you are concerned about.  If yes, do
    >> we have data (and data analysis) about performance here?
    >> 
    >> Martin
    >> 
    >> >>
    >> >> On 1/28/19 19:00, Michael Lawrence wrote: > I agree
    >> (2) >> is a good compromise. CC'ing Martin for his
    >> perspective.
    >> >> >
    >> >> > Michael
    >> >> >
    >> >> > On Mon, Jan 28, 2019 at 6:58 PM Pages, Herve >>
    >> <hpages using fredhutch.org> wrote: >> Hi Aaron,
    >> >> >>
    >> >> >> The 4 matrix summarization generics currently
    >> defined >> in BiocGenerics >> are defined as followed:
    >> >> >>
    >> >> >> setGeneric("rowSums", signature="x") >> >>
    >> setGeneric("colSums", signature="x") >> >>
    >> setGeneric("rowMeans", signature="x") >> >>
    >> setGeneric("colMeans", signature="x")
    >> >> >>
    >> >> >> The only reason for having these definitions in >>
    >> BiocGenerics is to >> restrict dispatch the first >>
    >> argument. This is cleaner than what we would >> get with
    >> >> the implicit generics where dispatch is on all
    >> arguments >> (it >> doesn't really make sense to dispatch
    >> on toggles >> like 'na.rm' or >> 'dims'). Sticking to
    >> simple dispatch >> when possible makes life easier for >>
    >> the developer >> (especially in times of troubleshooting)
    >> and for the user >> >> (methods are easier to discover
    >> and their man pages >> easier to access).
    >> >> >>
    >> >> >> However, the 4 statements above create new generics
    >> >> that mask the >> implicit generics defined in the
    >> Matrix >> package (Matrix doesn't contain >> any
    >> setGeneric >> statements for these generics, only
    >> setMethod >> >> statements). This is a very unsatisfying
    >> situation and it >> has hit me >> repeatedly over the
    >> last couple of years.
    >> >> >>
    >> >> >> We have basically 3 ways to go. From simpler to
    >> more >> complicated:
    >> >> >>
    >> >> >> 1) Give up on single dispatch for these
    >> generics. That >> is, we remove the >> 4 statements above
    >> from >> BiocGenerics. Then we use setMethod() in package
    >> >> code >> like Matrix does.
    >> >> >>
    >> >> >> 2) Convince the Matrix folks to put the 4
    >> statements >> above in Matrix.  >> Then any BioC package
    >> that needs to >> define methods for these generics >>
    >> would just need to >> import them from the Matrix
    >> package. Maybe we could >> >> even push this one step
    >> further by having BiocGenerics >> import and >> re-export
    >> these generics. This would make >> them "available" in
    >> BioC as >> soon as the BiocGenerics >> is loaded (and any
    >> package that needs to define >> >> methods on them would
    >> just need to import them from >> BiocGenerics).
    >> >> >>
    >> >> >> 3) Put the 4 statements above in a MatrixGenerics
    >> >> package. Then convince >> the Matrix folks to define
    >> >> methods on the generics defined in >> >>
    >> MatrixGenerics. Very unlikely to happen!
    >> >> >>
    >> >> >> IMO 2) is the best compromise. Want to give it a
    >> shot?
    >> >> >>
    >> >> >> H.
    >> >> >>
    >> >> >>
    >> >> >> On 1/27/19 13:45, Aaron Lun wrote: >>> This is a >>
    >> resurrection of some old threads:
    >> >> >>>
    >> >> >>>
    >> >>
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_pipermail_bioc-2Ddevel_2017-2DNovember_012273.html&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&s=pcpUyjpkQe6U79lZ_n2SANNp6Zj_s6i1Sq2yZx2NSjw&e=
    >> >> >>>
    >> >> >>>
    >> >>
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_MatrixGenerics_issues&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&s=NrmcVnmvgkDp3p64J-izZU9VD5nFsFCWOTI-TsnzCpY&e=
    >> >> >>>
    >> >> >>> For those who are unfamiliar with this, the basic
    >> >> issue is that various >>> Matrix and BiocGenerics >>
    >> functions mask each other. This is mildly >>> frustrating
    >> >> in interactive sessions:
    >> >> >>>
    >> >> >>>> library(Matrix) >>>> library(DelayedArray) >>>> x
    >> <- >> rsparsematrix(10, 10, 0.1) >>>> colSums(x) # fails
    >> >>>> >> Matrix::colSums(x) # okay >>> ... but quite
    >> annoying >> during package development, requiring code
    >> like >>> this:
    >> >> >>>
    >> >> >>> if (is(x, "Matrix")) { >>> z <- Matrix::colSums(x)
    >> >> >>> } else { >>> z <- colSums(x) # assuming
    >> DelayedArray >> does the masking.  >>> }
    >> >> >>>
    >> >> >>> ... which defeats the purpose of using S4 dispatch
    >> in >> the first place.
    >> >> >>>
    >> >> >>> I have been encountering this issue with
    >> increasing >> frequency in my >>> packages, as a lot of
    >> my code base >> needs to be able to interface with >>>
    >> both Matrix and >> Bioconductor objects (e.g.,
    >> DelayedMatrices) at the >>> >> same time. What needs to
    >> happen so that I can just write:
    >> >> >>>
    >> >> >>> z <- colSums(x)
    >> >> >>>
    >> >> >>> ... and everything will work for both Matrix and
    >> >> Bioconductor classes?  >>> It seems that many of these
    >> >> function names are implicit generics >>> anyway, can
    >> >> BiocGenerics take advantage of that for the time
    >> being?
    >> >> >>>
    >> >> >>> Best,
    >> >> >>>
    >> >> >>> Aaron
    >> >> >>>
    >> >> >>> _______________________________________________
    >> >>> >> Bioc-devel using r-project.org mailing list >>> >>
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=O21AQgvbUp3XRwM4jf0WeZA2ePj9yT3fc2X5hOsKNJk&s=JtgGBnaZJH44fV8OUp-SwnHxhD_i_mdVkqoMfUoA5tM&e=
    >> >> >> --
    >> >> >> Hervé Pagès
    >> >> >>
    >> >> >> Program in Computational Biology >> Division of
    >> Public >> Health Sciences >> Fred Hutchinson Cancer
    >> Research Center >> >> 1100 Fairview Ave. N, M1-B514 >>
    >> P.O. Box 19024 >> >> Seattle, WA 98109-1024
    >> >> >>
    >> >> >> E-mail: hpages using fredhutch.org >> Phone: (206)
    >> 667-5791 >> >> Fax: (206) 667-1319
    >> >> >>
    >> >> >> _______________________________________________ >>
    >> >> Bioc-devel using r-project.org mailing list >> >>
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=c-Mmi30ouubEEHC5W9_X6DIwxblt1nQlIfgCaK8uCJU&s=U8Hu1kzglD_RP7t_eR5w_nYAIaupBgrEKx11geSZwVg&e=
    >> >>
    >> >> --
    >> >> Hervé Pagès
    >> >>
    >> >> Program in Computational Biology Division of Public >>
    >> Health Sciences Fred Hutchinson Cancer Research Center >>
    >> 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
    >> >> 98109-1024
    >> >>
    >> >> E-mail: hpages using fredhutch.org Phone: (206) 667-5791
    >> Fax: >> (206) 667-1319
    >> >>

    > -- 
    > Hervé Pagès

    > Program in Computational Biology Division of Public Health
    > Sciences Fred Hutchinson Cancer Research Center 1100
    > Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
    > 98109-1024

    > E-mail: hpages using fredhutch.org Phone: (206) 667-5791 Fax:
    > (206) 667-1319



More information about the Bioc-devel mailing list