[Bioc-devel] Compatibility of Bioconductor with tidyverse S3 classes/methods

stefano m@ng|o|@@te|@no @end|ng |rom gm@||@com
Fri Feb 7 02:34:24 CET 2020


Thanks Michael,

yes in a sense, ttBulk and SummariseExperiment can be considere as two
interfaces. Would be fair enough to create a function that convert from one
to the other, although the default would be ttBulk?

*> I'm not sure the tidyverse is a great answer to the user interface,
because it lacks domain semantics *

Would be fair to say that ttBulk class could be considered a tibble with
specific semantics? In the sense that it holds information about key column
names (.sample, .transcript, .abundance, .normalised_abundance, etc..), and
has a validator (that is triggered at every ttBulk function).

I think at the moment, given (i) S3 problem, and (ii) the lack of formal
foundation on SummaisedExperiment interface (that maybe would require an S4
technology itself, where SummariseExperiment could be a slot?) my package
would belong more to CRAN, until those two issues will have been resolved.

I imagine there are not many cases where a CRAN package migrated to
Bioconductor after complying with the ecosystem policies.

Thanks a lot.

Best wishes.

*Stefano *



Stefano Mangiola | Postdoctoral fellow

Papenfuss Laboratory

The Walter Eliza Hall Institute of Medical Research

+61 (0)466452544


Il giorno ven 7 feb 2020 alle ore 12:12 Michael Lawrence <
lawrence.michael using gene.com> ha scritto:

> There's a difference between implementing software, where one wants
> formal data structures, and providing a convenient user interface.
> Software needs to interface with other software, so a package could
> provide both types of interfaces, one based on rich (S4) data
> structures, another on simpler structures with an API more amenable to
> analysis. I'm not sure the tidyverse is a great answer to the user
> interface, because it lacks domain semantics. This is still an active
> area of research (see Stuart Lee's plyranges, for example). I hope you
> can find a reasonable compromise that enables you to integrate ttBulk
> into Bioconductor, so that it can take advantage of the synergies the
> ecosystem provides.
>
> PS: There is no simple fix for your example.
>
> Michael
>
> On Thu, Feb 6, 2020 at 4:12 PM stefano <mangiolastefano using gmail.com> wrote:
> >
> > Thanks a lot for your comment Martin and Michael,
> >
> > Here I reply to Marti's comment. Michael I will try to implement your
> > solution!
> >
> > I think a key point from
> >
> https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
> > (that I was under-looking) is
> >
> > *>> "So to sum up: if you submit a package to Bioconductor, there is an
> > expectation that your package can work seamlessly with other Bioconductor
> > packages, and your implementation should support that. The safest and
> > easiest way to do that is to use Bioconductor data structures"*
> >
> > In this case my package would not be suited as I do not use pre-existing
> > Bioconductor data structures, but instead i see value in using a simple
> > tibble, for the reasons in part explained in the README
> > https://github.com/stemangiola/ttBulk (harvesting the power of tidyverse
> > and friends for bulk transcriptomic analyses).
> >
> > *>> "with the minimum standard of being able to accept such objects even
> if
> > you do not rely on them internally (though you should)"*
> >
> > With this I can comply in the sense that I can built converters to and
> from
> > SummarizedExperiment (for example).
> >
> > * >> "If you don't want to do that, then that's a shame, but it would
> > suggest that Bioconductor would not be the right place to host this
> > package."*
> >
> > Well said.
> >
> > In summary, I do not rely on Bioconductor data structure, as I am
> proposing
> > another paradigm, but my back end is made of largely Bioconductor
> analysis
> > packages that I would like to interface with tidyverse. So
> >
> > 1) Should I build converters to Bioc. data structures, and force the use
> of
> > S3 object (needed to tiidyverse to work), or
> > 2) Submit to CRAN
> >
> > I don't have strong feeling for either, although I think Bioconductor
> would
> > be a good fit. Please community give me your honest opinions, I will take
> > them seriously and proceed.
> >
> >
> >
> > Best wishes.
> >
> > *Stefano *
> >
> >
> >
> > Stefano Mangiola | Postdoctoral fellow
> >
> > Papenfuss Laboratory
> >
> > The Walter Eliza Hall Institute of Medical Research
> >
> > +61 (0)466452544
> >
> >
> > Il giorno ven 7 feb 2020 alle ore 10:46 Martin Morgan <
> > mtmorgan.bioc using gmail.com> ha scritto:
> >
> > > The idea isn't to use S4 at any cost, but to 'play well' with the
> > > Bioconductor ecosystem, including writing robust and maintainable code.
> > >
> > > This comment
> > >
> https://github.com/Bioconductor/Contributions/issues/1355#issuecomment-580977106
> > > provides some motivation; there was also an interesting exchange on the
> > > Bioconductor community slack about this (join at
> > > https://bioc-community.herokuapp.com/; discussion starting with
> > > https://community-bioc.slack.com/archives/C35G93GJH/p1580144746014800
> ).
> > > The plyranges package http://bioconductor.org/packages/plyranges and
> > > recently accepted fluentGenomics workflow
> > > https://github.com/Bioconductor/Contributions/issues/1350 provide
> > > illustrations.
> > >
> > > In your domain it's really surprising that your package does not use
> > > (Import or Depend on) SummarizedExperiment or GenomicRanges packages.
> From
> > > a superficial look at your package, it seems like something like
> > > `reduce_dimensions()` could be defined to take & return a
> > > SummarizedExperiment and hence benefit from some of the points in the
> > > github issue comment mentioned above.
> > >
> > > Certainly there is a useful transition, both 'on the way in' to a
> > > SummarizedExperiment, and after leaving the more specialized
> bioinformatic
> > > computations to, e.g., display a pairs plot of the reduced dimensions,
> > > where one might re-shape the data to a tidy format and use 'plain old'
> > > tibbles; the fluentGenomics workflow might provide some guidance.
> > >
> > > At the end of the day it would not be surprising for Bioconductor
> packages
> > > to make use of tidy concepts and data structures, particularly in the
> > > vignette, and it would be a mistake for Bioconductor to exclude
> > > well-motivated 'tidy' representations.
> > >
> > > Martin Morgan
> > >
> > > On 2/6/20, 5:46 PM, "Bioc-devel on behalf of stefano" <
> > > bioc-devel-bounces using r-project.org on behalf of
> mangiolastefano using gmail.com>
> > > wrote:
> > >
> > >     Hello,
> > >
> > >     I have a package (ttBulk) under review. I have been told to replace
> > > the S3
> > >     system to S4. My package is based on the class tbl_df and must be
> fully
> > >     compatible with tidyverse methods (inheritance). After some tests
> and
> > >     research I understood that tidyverse ecosystem is not compatible
> with
> > > S4
> > >     classes.
> > >
> > >      For example, several methos do not apparently handle S4 objects
> based
> > > on
> > >     S3 tbl_df
> > >
> > >     ```library(tidyverse)setOldClass("tbl_df")
> > >     setClass("test2", contains = "tbl_df")
> > >     my <- new("test2",  tibble(a = 1))
> > >     my %>%  mutate(b = 3)
> > >
> > >        a b
> > >     1 1 3
> > >     ```
> > >
> > >      ```my <- new("test2",  tibble(a = rnorm(100), b = 1))
> > >     my %>% nest(data = -b)
> > >     Error: `x` must be a vector, not a `test2` object
> > >     Run `rlang::last_error()` to see where the error occurred.
> > >     ```
> > >
> > >     Could you please advise whether a tidyverse based package can be
> > > hosted on
> > >     Bioconductor, and if S4 classes are really mandatory? I need to
> > > understand
> > >     if I am forced to submit to CRAN instead (although Bioconductor
> would
> > > be a
> > >     good fit, sice I try to interface transcriptional analysis tools to
> > > tidy
> > >     universe)
> > >
> > >
> > >     Thanks a lot.
> > >     Stefano
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > >     _______________________________________________
> > >     Bioc-devel using r-project.org mailing list
> > >     https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>
>
> --
> Michael Lawrence
> Senior Scientist, Bioinformatics and Computational Biology
> Genentech, A Member of the Roche Group
> Office +1 (650) 225-7760
> michafla using gene.com
>
> Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list