[Bioc-devel] any interest in a BiocMatrix core package?

Peter Hickey peter@hickey @ending from gm@il@com
Mon May 21 23:01:40 CEST 2018


A belated follow-up on this thread.

I've created a minimal package and GitHub repo at
https://github.com/Bioconductor/MatrixGenerics; might I suggest we
move the discussion there for the time being?

I've created some issues already to discuss the main points. These
would really benefit from input by experts on S4 and the methods
package, as well as anyone invested in the original subject of the
thread.

Cheers,
Pete

On 4 November 2017 at 19:33, Henrik Bengtsson
<henrik.bengtsson at gmail.com> wrote:
> As Peter points out, the 'matrixStats' package provides an API with
> plain functions - not generic functions.  This is intentional with the
> main purpose of this is to keep the overhead at an absolute minimum.
> This is also in line with the overall philosophy of 'matrixStats'
> where speed is maximized and memory usage is minimized to the point
> where you cannot do much better if you'd use native code.   The user
> should be able to call the same matrixStats function thousands of
> times even on rather small matrices without getting killed by overhead
> due to dispatching or internal copies, e.g. [toy example] resampling
> 'cols' B=10,000 times in calls such as matrixStats::rowMeans2(X, cols
> = cols)`.  You can find extensive benchmark reports at
> https://github.com/HenrikBengtsson/matrixStats/wiki/Benchmark-reports.
>
> From my perspective, the role of 'matrixStats' in a software stack is
> a rather low-level role where it can serve higher-level API that
> either replicate its API or reuse it internally, e.g. those that
> dispatch on S3 and S4 etc.  Peter's 'DelayedMatrixStats' is one
> example.
>
> On Thu, Nov 2, 2017 at 2:00 AM, Martin Maechler
> <maechler at stat.math.ethz.ch> wrote:
> [...]
>> Honestly, I (as co-maintainer of Matrix, principal maintainer
>>              for several years now)
>> had been a bit surprised and frustrated that the 'matrixStats'
>> initiative had started w/o any contact with the Matrix package
>> maintainers and initially has not ever tried to use Matrix
>> package classes or functionality
>> (and this is still the case now AFAICS).
>
> Oh no, I'm sorry that I/we've caused frustration with 'matrixStats'.
> I'm not sure I understand though - the overlap in API and
> functionality between 'matrixStats' and 'Matrix' is basically zero(?).
> I think of 'Matrix' a higher-level package.  Do my comments above put
> it in a different light?  Or are you saying that what's in
> 'matrixStats' should really have been in 'Matrix'?
>
> All the best,
>
> Henrik
>
> On Fri, Nov 3, 2017 at 7:16 AM, Martin Morgan
> <martin.morgan at roswellpark.org> wrote:
>> On 11/02/2017 06:20 PM, Peter Hickey wrote:
>>>
>>> As Michael notes, I think the scope here is broader than considering S4
>>> generics for functions in base R. To summarise, I think we would be
>>> looking
>>> to have S4 generics for the following:
>>>
>>> - All(?) the row*/col* functions in matrixStats (NB: matrixStats uses
>>> plain
>>> old functions with no S3 or S4, which I believe was to avoid any overhead
>>> of method dispatch since it is explicitly targeting ordinary matrix
>>> objects
>>> as input)
>>> - Potentially new row*/col* summaries (i.e. that don't currently exist in
>>> matrixStats)
>>> - Perhaps moving from BiocGenerics the S4 generics defined in
>>> R/matrix-summary.R?
>>> - Perhaps apply() (E.g., DelayedArray defines an S4 generic for this)
>>>
>>> Having these as part of base R or in a recommended packages would be
>>> great,
>>> but of course comes with its own challenges. The alternative is a
>>> lightweight package, likely better hosted on CRAN than BioC to assist with
>>> wider adoption and integration with Matrix, matrixStats, and other
>>> non-BioC
>>> packages.
>>>
>>> As Michael notes, getting the generic signature 'right' will be important
>>> and there are undoubtedly other challenges ahead (I've started a TODO).
>>>
>>> Might Bioconductor open up a GitHub repo (MatrixGenerics?) where this can
>>> be discussed with accompanying code. I've made the skeleton of a
>>> MatrixGenerics package that I could upload to kick things off, along with
>>> adding my TODOs as Issues on GitHub for further discussion.
>>
>>
>> I did start this repository as a place to develop more concrete ideas; I
>> think that a Bioconductor MatrixGenerics solution would not be optimal, so I
>> think of this repository as a place to develop ideas rather than a precursor
>> to an actual package.
>>
>> I invited Pete as a Collaborator with 'Admin' privileges, so I think he
>> should be able to extend Collaborator invites to other interested parties.
>>
>> Martin
>>
>>
>>>
>>> Cheers,
>>> Pete
>>>
>>>
>>> On Thu, 2 Nov 2017 at 13:10 Michael Lawrence <lawrence.michael at gene.com>
>>> wrote:
>>>
>>>> I'm pretty sure we're also considering generics for functions that do not
>>>> exist in base R. Like rowVars() and colVars(). This sort of suggests that
>>>> matrixStats should be part of base R.
>>>>
>>>> As an aside, we should think about the signature on those implicit
>>>> generics. Should they really include na.rm and dims? The simpler the
>>>> signature, the easier to understand the API.
>>>>
>>>>
>>>> On Thu, Nov 2, 2017 at 10:38 AM, Martin Maechler <
>>>> maechler at stat.math.ethz.ch
>>>>>
>>>>> wrote:
>>>>
>>>>
>>>>>>>>>> Martin Morgan <martin.morgan at roswellpark.org>
>>>>>>>>>>      on Thu, 2 Nov 2017 06:17:19 -0400 writes:
>>>>>
>>>>>
>>>>>      > On 11/02/2017 05:00 AM, Martin Maechler wrote:
>>>>>      >>>>>>> "ML" == Michael Lawrence <lawrence.michael at gene.com>
>>>>>      >>>>>>> on Wed, 1 Nov 2017 14:13:54 -0700 writes:
>>>>>      >>
>>>>>      >> > Probably way easier to add the generics to the Matrix >
>>>>>      >> package and everyone just depends on that.
>>>>>      >>
>>>>>      >> Yes!  It is 'Recommended' and comes with every R
>>>>>      >> installation, and has had many such matrix S4 methods in
>>>>>      >> place for > 10 years, notably for dealing with (large)
>>>>>      >> sparse matrices.
>>>>>      >>
>>>>>      >> Honestly, I (as co-maintainer of Matrix, principal
>>>>>      >> maintainer for several years now) had been a bit
>>>>>      >> surprised and frustrated that the 'matrixStats'
>>>>>      >> initiative had started w/o any contact with the Matrix
>>>>>      >> package maintainers and initially has not ever tried to
>>>>>      >> use Matrix package classes or functionality (and this is
>>>>>      >> still the case now AFAICS).
>>>>>      >>
>>>>>      >> I'm happy to coordinate with maintainers of bioc packages
>>>>>      >> about which generics (and classes !) to use and export,
>>>>>      >> etc.
>>>>>
>>>>>      > One issue is that Matrix is a relatively large package
>>>>>      > (well, I wonder if that's a reasonable statement, given
>>>>>      > the Bioc dependencies and data involved, but perhaps in
>>>>>      > general...) and hence 'overkill' to obtain a collection of
>>>>>      > generics. Is there any prospect for factoring out the
>>>>>      > definition of the generics from implementation of the
>>>>>      > methods?  Re-purposing stats4 ?
>>>>>
>>>>>      > Martin Morgan
>>>>>
>>>>> Hmm..  we have quite a few  setGenericImplicit()  statements in
>>>>> the methods package already, notably for  'colSums' and friends,
>>>>> and so other decent citizen packages do *NOT*  setGeneric() at
>>>>> all on these ... and of course, Matrix _is_ a decent citizen in
>>>>> the R package universe.
>>>>>
>>>>> Instead of to stats4, I'm pretty sure we should only consider
>>>>> what functions should be added to the implicit generics already
>>>>> provided by the 'methods' package itself.
>>>>>
>>>>> Could it be that (some of) you are not properly aware of
>>>>> implicit generics?
>>>>>
>>>>> If you start 'R --vanilla' you can say
>>>>>
>>>>>> implicitGeneric("colSums")
>>>>>
>>>>> standardGeneric for "colSums" defined from package "base"
>>>>>
>>>>> function (x, na.rm = FALSE, dims = 1, ...)
>>>>> standardGeneric("colSums")
>>>>> <bytecode: 0x6cb4798>
>>>>> <environment: 0x6cab560>
>>>>> Methods may be defined for arguments: x, na.rm, dims
>>>>> Use  showMethods("colSums")  for currently available ones.
>>>>> ---------
>>>>>
>>>>> so I think it is clear how *any* decent package has to define
>>>>> methods for colSums(), and if they do, there should not be any
>>>>> conflicts.
>>>>>
>>>>> I think the problem is with S3 methods, not with S4 ones, where
>>>>> the implicit generics I understand where made for dealing with
>>>>> several packages writing methods for the same generic without
>>>>> one of the packages taking precedence.
>>>>>
>>>>> Martin Mächler
>>>>>
>>>>>
>>>>>
>>>>>      >>
>>>>>      >> Best, Martin Maechler ETH Zurich (and R core team)
>>>>>      >>
>>>>>      >>
>>>>>      >>
>>>>>      >> > On Wed, Nov 1, 2017 at 1:59 PM, Hervé Pagès >
>>>>>      >> <hpages at fredhutch.org> wrote:
>>>>>      >>
>>>>>      >> >> That's probably a good idea but a clean solution would
>>>>>      >> >> need to involve all players, including the Matrix >>
>>>>>      >> package. Right now there are conflicts for some S4 >>
>>>>>      >> generics defined in Matrix and in BiocGenerics >>
>>>>>      >> (e.g. rowSums). I'm not sure that moving rowSums from >>
>>>>>      >> BiocGenerics to a new MatrixGenerics package would >>
>>>>>      >> address this.  Unless MatrixGenerics is on CRAN and >>
>>>>>      >> Matrix depends on it ;-)
>>>>>      >> >>
>>>>>      >> >> How likely is this to happen?
>>>>>      >> >>
>>>>>      >> >> H.
>>>>>      >> >>
>>>>>      >> >>
>>>>>      >> [............]
>>>>>      >>
>>>>>      >> _______________________________________________
>>>>>      >> Bioc-devel at r-project.org mailing list
>>>>>      >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>      >>
>>>>>
>>>>>
>>>>>      > This email message may contain legally privileged and/or
>>>>>      > confidential information.  If you are not the intended
>>>>>      > recipient(s), or the employee or agent responsible for the
>>>>>      > delivery of this message to the intended recipient(s), you
>>>>>      > are hereby notified that any disclosure, copying,
>>>>>      > distribution, or use of this email message is prohibited.
>>>>>      > If you have received this message in error, please notify
>>>>>      > the sender immediately by e-mail and delete this email
>>>>>      > message from your computer. Thank you.
>>>>>
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>> This email message may contain legally privileged and/or...{{dropped:2}}
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list