[Bioc-devel] Virtual class for `matrix` and `DelayedArray`? (or better strategy for dealing with them both)

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed May 2 13:26:33 CEST 2018


We did this recently in minfi and getting acceptable performance was not
trivial. We are still sorting out issues; we will eventually write about
our experiences. Pete is working on some blog posts on this and we hope to
do something more formal later.

We would all love experience from more people on this, but just be aware
that right now it involves pain.

Best,
Kasper

On Wed, May 2, 2018 at 2:13 AM, Elizabeth Purdom <epurdom at stat.berkeley.edu>
wrote:

> Thanks Hervé and Stephanie for your suggestions. I am really looking for a
> S4 methods solution however, given how my package is already set up. Also,
> I have several functions that I need to adapt in this way, so it seems
> cleaner and simpler to do the class union, which sounds like is not a
> problem  — for Stephanie’s solution, for each function I’d have to write 2
> S4methods and an internal function which feels more cluttered code to
> maintain for me. And it sounds like there’s not a virtual class I could use
> instead so I am correct to defining it myself.
>
> In terms of the setClassUnion, I chose “DelayedArray” because I wanted to
> capture HDF5Matrix and DelayedMatrix, but I now see that HDF5Matrix
> inherits from DelayedMatrix. I must have missed that somehow.
>
> Thanks,
> Elizabeth
>
> > On Apr 30, 2018, at 8:35 PM, Hervé Pagès <hpages at fredhutch.org> wrote:
> >
> > The class union should probably be:
> >
> >  setClassUnion("matrixOrDelayed", c("matrix", "DelayedMatrix"))
> >
> > i.e. use DelayedMatrix instead of DelayedArray.
> >
> > So in addition to the class union and to Stephanie's solution, which
> > IMO are both valid solutions, you could also go for something like this:
> >
> > myNewRowMeans <- function(x,...)
> > {
> >    if (length(dim(x)) != 2)
> >        stop("'x' must be a matrix-like object")
> >    ...
> > )
> >
> > that is, just a regular function that checks that 'x' is matrix-like
> > based on its number of dimensions. If you really want to restrict to
> > matrix and DelayedMatrix only, replace the test with
> >
> >    if (!(is.matrix(x) || is(x, "DelayedMatrix")))
> >        stop("'x' must be a matrix or DelayedMatrix object")
> >
> > The difference being that now the function will reject matrix-like
> > objects that are not matrix or DelayedMatrix objects (e.g. a Matrix
> > derivative from the Matrix package).
> >
> > Cheers,
> > H.
> >
> >
> > On 04/30/2018 09:29 AM, Stephanie M. Gogarten wrote:
> >> Rather than a class union, how about an internal function that is
> called by the methods for both matrix and DelayedArray:
> >> setGeneric("myNewRowMeans", function(x,...) { standardGeneric("
> myNewRowMeans")})
> >> #' @importFrom DelayedArray rowMeans
> >> .myNewRowMeans <- function(x,...){
> >>     # a lot of code independent of x
> >>     print("This is a lot of code shared regardless of class of x\n")
> >>     # a lot of code that depends on x, but is dispatched by the
> functions called
> >>     out<-rowMeans(x)
> >>     #a lot of code based on output of out
> >>     out<-out+1
> >>     return(out)
> >> }
> >> setMethod("myNewRowMeans",
> >>           signature = "matrix",
> >>           definition = function(x,...){
> >>               .myNewRowMeans(x,...)
> >>           }
> >> )
> >> setMethod("myNewRowMeans",
> >>           signature = "DelayedArray",
> >>           definition = function(x,...){
> >>               .myNewRowMeans(x,...)
> >>           }
> >> )
> >> On 4/30/18 9:10 AM, Tim Triche, Jr. wrote:
> >>> But if you merge methods like that, the error method can be that much
> more
> >>> difficult to identify. It took a couple of weeks to chase that bug down
> >>> properly, and it ended up down to rowMeans2 vs rowMeans.
> >>>
> >>> I suppose the merged/abstracted method allows to centralize any such
> >>> dispatch into one place and swap out ill-behaved methods once
> identified,
> >>> so as long as DelayedArray/DelayedMatrixStats quirks are
> >>> documented/understood, maybe it is better to create this union class?
> >>>
> >>> The Matrix/matrixStats/DelayedMatrix/DelayedMatrixStats situation has
> been
> >>> "interesting" in practical terms, as seemingly simple abstractions
> appear
> >>> to require more thought. That was my only point.
> >>>
> >>>
> >>> --t
> >>>
> >>> On Mon, Apr 30, 2018 at 11:28 AM, Martin Morgan <
> >>> martin.morgan at roswellpark.org> wrote:
> >>>
> >>>> But that issue will be fixed, so Tim's advice is inappropriate.
> >>>>
> >>>>
> >>>> On 04/30/2018 10:42 AM, Tim Triche, Jr. wrote:
> >>>>
> >>>>> Don't do that.  Seriously, just don't.
> >>>>>
> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_Bioconductor_DelayedArray_issues_16&d=DwIDaQ&c=
> eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=
> Rhy4i6H9xaY8HzWv9v_jhOnp5OyEpJcG52RP3nHorU8&s=
> olbErqY3_l7i45-WeTkaUNGalrQQr-7i59rhJVF6OGQ&e=
> >>>>>
> >>>>> --t
> >>>>>
> >>>>> On Mon, Apr 30, 2018 at 10:02 AM, Elizabeth Purdom <
> >>>>> epurdom at stat.berkeley.edu> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>>
> >>>>>> I am trying to extend my package to handle `HDF5Matrix` class ( or
> more
> >>>>>> generally `DelayedArray`). I currently have S4 functions for
> `matrix`
> >>>>>> class. Usually I have a method for `SummarizedExperiment`, which
> will
> >>>>>> call
> >>>>>> call the method on `assay(x)` and I want the method to be able to
> deal
> >>>>>> with
> >>>>>> if `assay(x)` is a `DelayedArray`.
> >>>>>>
> >>>>>> Most of my functions, however, do not require separate code
> depending on
> >>>>>> whether `x` is a `matrix` or `DelayedArray`. They are making use of
> >>>>>> existing functions that will make that choice for me, e.g. rowMeans
> or
> >>>>>> subsetting. My goal right now is compatibility, not cleverness, and
> I'm
> >>>>>> not
> >>>>>> creating HDF5 methods to handle other cases. (If something doesn't
> >>>>>> currently exist, then I just enclose `x` with `data.matrix` or
> >>>>>> `as.matrix`
> >>>>>> and call the matrix into memory — for cleanliness and ease in
> updating
> >>>>>> with
> >>>>>> appropriate methods in future, I could make separate S4 functions
> for
> >>>>>> these
> >>>>>> specific tasks to dispatch, but that's outside of the scope of my
> >>>>>> question). So for simplicity assume I don't really need to dispatch
> *my
> >>>>>> code* -- that the methods I'm going to use do that.
> >>>>>>
> >>>>>> The natural solution for me seem to use `setClassUnion` and I was
> >>>>>> wondering if such a virtual class already exists? Or is there a
> better
> >>>>>> way
> >>>>>> to handle this?
> >>>>>>
> >>>>>> Here's a simple example, using `rowMeans` as my example:
> >>>>>>
> >>>>>> ```
> >>>>>> setGeneric("myNewRowMeans", function(x,...) { standardGeneric("
> >>>>>> myNewRowMeans")})
> >>>>>> setClassUnion("matrixOrDelayed",members=c("matrix",
> "DelayedArray"))
> >>>>>>
> >>>>>> #' @importFrom DelayedArray rowMeans
> >>>>>> setMethod("myNewRowMeans",
> >>>>>>             signature = "matrixOrDelayed",
> >>>>>>             definition = function(x,...){
> >>>>>>                           # a lot of code independent of x
> >>>>>>                           print("This is a lot of code shared
> regardless
> >>>>>> of
> >>>>>> class of x\n")
> >>>>>>                           # a lot of code that depends on x, but is
> >>>>>> dispatched by the functions called
> >>>>>>                           out<-rowMeans(x)
> >>>>>>                           #a lot of code based on output of out
> >>>>>>                           out<-out+1
> >>>>>>                           return(out)
> >>>>>>                   }
> >>>>>> )
> >>>>>> ```
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioc-devel at r-project.org mailing list
> >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=
> BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Rhy4i6H9xaY8HzWv9v_
> jhOnp5OyEpJcG52RP3nHorU8&s=PcBHWXeL0_5KMWSkRgj5UXk640tXb20rGH9sO98oR2w&e=
> >>>>>>
> >>>>>>
> >>>>>          [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org mailing list
> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=
> BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Rhy4i6H9xaY8HzWv9v_
> jhOnp5OyEpJcG52RP3nHorU8&s=PcBHWXeL0_5KMWSkRgj5UXk640tXb20rGH9sO98oR2w&e=
> >>>>>
> >>>>>
> >>>>
> >>>> This email message may contain legally privileged and/or confidential
> >>>> information.  If you are not the intended recipient(s), or the
> employee or
> >>>> agent responsible for the delivery of this message to the intended
> >>>> recipient(s), you are hereby notified that any disclosure, copying,
> >>>> distribution, or use of this email message is prohibited.  If you have
> >>>> received this message in error, please notify the sender immediately
> by
> >>>> e-mail and delete this email message from your computer. Thank you.
> >>>>
> >>>
> >>>     [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioc-devel at r-project.org mailing list
> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=
> BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Rhy4i6H9xaY8HzWv9v_
> jhOnp5OyEpJcG52RP3nHorU8&s=PcBHWXeL0_5KMWSkRgj5UXk640tXb20rGH9sO98oR2w&e=
> >>>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=
> BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Rhy4i6H9xaY8HzWv9v_
> jhOnp5OyEpJcG52RP3nHorU8&s=PcBHWXeL0_5KMWSkRgj5UXk640tXb20rGH9sO98oR2w&e=
> >
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fredhutch.org
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list