[Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

stefano m@ng|o|@@te|@no @end|ng |rom gm@||@com
Fri Feb 7 06:30:57 CET 2020


Would this scenario satisfy " make the package _directly_ compatible with
standard Bioconductor data structures"

If an input is SummarizedExperiment return SummarizedExperiment, if the
input is a tbl_df or ttBulk, return ttBulk (?)


Best wishes.

*Stefano *



Stefano Mangiola | Postdoctoral fellow

Papenfuss Laboratory

The Walter Eliza Hall Institute of Medical Research

+61 (0)466452544


Il giorno ven 7 feb 2020 alle ore 16:15 Michael Lawrence <
lawrence.michael using gene.com> ha scritto:

> I would urge you to make the package _directly_ compatible with
> standard Bioconductor data structures; no explicit conversion. But you
> can create wrapper methods (even on an S3 generic) that perform the
> conversion automatically. You'll probably want two separate APIs
> though (in different styles), for one thing automatic conversion is
> obviously not possible for return values.
>
> Michael
>
> On Thu, Feb 6, 2020 at 5:34 PM stefano <mangiolastefano using gmail.com> wrote:
> >
> > Thanks Michael,
> >
> > yes in a sense, ttBulk and SummariseExperiment can be considere as two
> interfaces. Would be fair enough to create a function that convert from one
> to the other, although the default would be ttBulk?
> >
> > > I'm not sure the tidyverse is a great answer to the user interface,
> because it lacks domain semantics
> >
> > Would be fair to say that ttBulk class could be considered a tibble with
> specific semantics? In the sense that it holds information about key column
> names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and
> has a validator (that is triggered at every ttBulk function).
> >
> > I think at the moment, given (i) S3 problem, and (ii) the lack of formal
> foundation on SummaisedExperiment interface (that maybe would require an S4
> technology itself, where SummariseExperiment could be a slot?) my package
> would belong more to CRAN, until those two issues will have been resolved.
> >
> > I imagine there are not many cases where a CRAN package migrated to
> Bioconductor after complying with the ecosystem policies.
> >
> > Thanks a lot.
> >
> > Best wishes.
> >
> > Stefano
> >
> >
> >
> > Stefano Mangiola | Postdoctoral fellow
> >
> > Papenfuss Laboratory
> >
> > The Walter Eliza Hall Institute of Medical Research
> >
> > +61 (0)466452544
> >
> >
> >
> > Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <
> lawrence.michael using gene.com> ha scritto:
> >>
> >> There's a difference between implementing software, where one wants
> >> formal data structures, and providing a convenient user interface.
> >> Software needs to interface with other software, so a package could
> >> provide both types of interfaces, one based on rich (S4) data
> >> structures, another on simpler structures with an API more amenable to
> >> analysis. I'm not sure the tidyverse is a great answer to the user
> >> interface, because it lacks domain semantics. This is still an active
> >> area of research (see Stuart Lee's plyranges, for example). I hope you
> >> can find a reasonable compromise that enables you to integrate ttBulk
> >> into Bioconductor, so that it can take advantage of the synergies the
> >> ecosystem provides.
> >>
> >> PS: There is no simple fix for your example.
> >>
> >> Michael
> >>
> >> On Thu, Feb 6, 2020 at 4:12 PM stefano <mangiolastefano using gmail.com>
> wrote:
> >> >
> >> > Thanks a lot for your comment Martin and Michael,
> >> >
> >> > Here I reply to Marti's comment. Michael I will try to implement your
> >> > solution!
> >> >
> >> > I think a key point from
> >> >
> https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
> >> > (that I was under-looking) is
> >> >
> >> > *>> "So to sum up: if you submit a package to Bioconductor, there is
> an
> >> > expectation that your package can work seamlessly with other
> Bioconductor
> >> > packages, and your implementation should support that. The safest and
> >> > easiest way to do that is to use Bioconductor data structures"*
> >> >
> >> > In this case my package would not be suited as I do not use
> pre-existing
> >> > Bioconductor data structures, but instead i see value in using a
> simple
> >> > tibble, for the reasons in part explained in the README
> >> > https://github.com/stemangiola/ttBulk (harvesting the power of
> tidyverse
> >> > and friends for bulk transcriptomic analyses).
> >> >
> >> > *>> "with the minimum standard of being able to accept such objects
> even if
> >> > you do not rely on them internally (though you should)"*
> >> >
> >> > With this I can comply in the sense that I can built converters to
> and from
> >> > SummarizedExperiment (for example).
> >> >
> >> > * >> "If you don't want to do that, then that's a shame, but it would
> >> > suggest that Bioconductor would not be the right place to host this
> >> > package."*
> >> >
> >> > Well said.
> >> >
> >> > In summary, I do not rely on Bioconductor data structure, as I am
> proposing
> >> > another paradigm, but my back end is made of largely Bioconductor
> analysis
> >> > packages that I would like to interface with tidyverse. So
> >> >
> >> > 1) Should I build converters to Bioc. data structures, and force the
> use of
> >> > S3 object (needed to tiidyverse to work), or
> >> > 2) Submit to CRAN
> >> >
> >> > I don't have strong feeling for either, although I think Bioconductor
> would
> >> > be a good fit. Please community give me your honest opinions, I will
> take
> >> > them seriously and proceed.
> >> >
> >> >
> >> >
> >> > Best wishes.
> >> >
> >> > *Stefano *
> >> >
> >> >
> >> >
> >> > Stefano Mangiola | Postdoctoral fellow
> >> >
> >> > Papenfuss Laboratory
> >> >
> >> > The Walter Eliza Hall Institute of Medical Research
> >> >
> >> > +61 (0)466452544
> >> >
> >> >
> >> > Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan <
> >> > mtmorgan.bioc using gmail.com> ha scritto:
> >> >
> >> > > The idea isn't to use S4 at any cost, but to 'play well' with the
> >> > > Bioconductor ecosystem, including writing robust and maintainable
> code.
> >> > >
> >> > > This comment
> >> > >
> https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
> >> > > provides some motivation; there was also an interesting exchange on
> the
> >> > > Bioconductor community slack about this (join at
> >> > > https://bioc-community.herokuapp.com/; discussion starting with
> >> > >
> https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800).
> >> > > The plyranges package http://bioconductor.org/packages/plyranges
> and
> >> > > recently accepted fluentGenomics workflow
> >> > > https://github.com/Bioconductor/Contributions/issues/1350 provide
> >> > > illustrations.
> >> > >
> >> > > In your domain it's really surprising that your package does not use
> >> > > (Import or Depend on) SummarizedExperiment or GenomicRanges
> packages. From
> >> > > a superficial look at your package, it seems like something like
> >> > > `reduce_dimensions()` could be defined to take & return a
> >> > > SummarizedExperiment and hence benefit from some of the points in
> the
> >> > > github issue comment mentioned above.
> >> > >
> >> > > Certainly there is a useful transition, both 'on the way in' to a
> >> > > SummarizedExperiment, and after leaving the more specialized
> bioinformatic
> >> > > computations to, e.g., display a pairs plot of the reduced
> dimensions,
> >> > > where one might re-shape the data to a tidy format and use 'plain
> old'
> >> > > tibbles; the fluentGenomics workflow might provide some guidance.
> >> > >
> >> > > At the end of the day it would not be surprising for Bioconductor
> packages
> >> > > to make use of tidy concepts and data structures, particularly in
> the
> >> > > vignette, and it would be a mistake for Bioconductor to exclude
> >> > > well-motivated 'tidy' representations.
> >> > >
> >> > > Martin Morgan
> >> > >
> >> > > On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" <
> >> > > bioc-devel-bounces using r-project.org on behalf of
> mangiolastefano using gmail.com>
> >> > > wrote:
> >> > >
> >> > >     Hello,
> >> > >
> >> > >     I have a package (ttBulk) under review. I have been told to
> replace
> >> > > the S3
> >> > >     system to S4. My package is based on the class tbl_df and must
> be fully
> >> > >     compatible with tidyverse methods (inheritance). After some
> tests and
> >> > >     research I understood that tidyverse ecosystem is not
> compatible with
> >> > > S4
> >> > >     classes.
> >> > >
> >> > >      For example, several methos do not apparently handle S4
> objects based
> >> > > on
> >> > >     S3 tbl_df
> >> > >
> >> > >     ```library(tidyverse)setOldClass("tbl_df")
> >> > >     setClass("test2", contains = "tbl_df")
> >> > >     my <- new("test2",  tibble(a = 1))
> >> > >     my %>%  mutate(b = 3)
> >> > >
> >> > >        a b
> >> > >     1 1 3
> >> > >     ```
> >> > >
> >> > >      ```my <- new("test2",  tibble(a = rnorm(100), b = 1))
> >> > >     my %>% nest(data = -b)
> >> > >     Error: `x` must be a vector, not a `test2` object
> >> > >     Run `rlang::last_error()` to see where the error occurred.
> >> > >     ```
> >> > >
> >> > >     Could you please advise whether a tidyverse based package can be
> >> > > hosted on
> >> > >     Bioconductor, and if S4 classes are really mandatory? I need to
> >> > > understand
> >> > >     if I am forced to submit to CRAN instead (although Bioconductor
> would
> >> > > be a
> >> > >     good fit, sice I try to interface transcriptional analysis
> tools to
> >> > > tidy
> >> > >     universe)
> >> > >
> >> > >
> >> > >     Thanks a lot.
> >> > >     Stefano
> >> > >
> >> > >         [[alternative HTML version deleted]]
> >> > >
> >> > >     _______________________________________________
> >> > >     Bioc-devel using r-project.org mailing list
> >> > >     https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >> > >
> >> > >
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > _______________________________________________
> >> > Bioc-devel using r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >>
> >>
> >> --
> >> Michael Lawrence
> >> Senior Scientist, Bioinformatics and Computational Biology
> >> Genentech, A Member of the Roche Group
> >> Office +1 (650) 225-7760
> >> michafla using gene.com
> >>
> >> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>
>
>
> --
> Michael Lawrence
> Senior Scientist, Bioinformatics and Computational Biology
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> michafla using gene.com
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list