[Bioc-devel] Need for file-based handling of meta-data

Thomas Girke thomas.girke at ucr.edu
Thu Jun 30 04:25:57 CEST 2016


Great thanks. I will add some ideas later this week or next week.

Thomas


On Wed, Jun 29, 2016 at 12:44 PM Martin Morgan <
martin.morgan at roswellpark.org> wrote:

> On 06/29/2016 03:42 PM, Thomas Girke wrote:
> > Yes, a "readSummarizedExperiment" would be a "modern-day analog of
> > Biobase::readExpressionSet". I also agree with the other suggestions
> > including github to get this started, and Vince's thoughts on binding
> > meta-data more tightly to source data as well as improving
> > interoperability.
>
> I started a repository at
>
>    https://github.com/Bioconductor/TenStepReproducible
>
> I envision this as a package / white paper / eventually publication.
> feel free to fork etc., and / or to contribute other ideas.
>
> Martin
>
> >
> > As suggested I am sharing this discussion with the bioc-devel list.
> >
> > Thomas
> >
> > On Wed, Jun 29, 2016 at 06:22:49PM +0000, Vincent Carey wrote:
> >
> >> Thanks Thomas -- I think this should be circulated to biocore for
> further comments.  I am in agreement
> >> that we need to do a better job at both demonstrating the values of a)
> binding metadata to data, b)
> >> using standard containers through workflows, c) allowing
> interoperation.  I learned some useful things
> >> about spreadsheet interoperation at the conference and need to learn
> more.
> >>
> >> In a sense we are giving a specific implementation of some of the rules
> in
> >>
> >>
> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
> >>
> >> and I wonder whether we could come up with another topic for the "ten
> simple rules"
> >> series that addresses these concerns, or do something similar, perhaps
> for F1000Research,
> >> with a Bioconductor-interoperability focus on metadata.
> >
> >
> >> On Wed, Jun 29, 2016 at 06:28:49PM +0000, Martin Morgan wrote:
> >
> >> I guess you mean a modern-day analog of Biobase::readExpressionSet ? I
> >> like the idea of templates, and also drafting a 'Ten Steps Toward
> >> Reproduciblity in R / Bioconductor'. Would be happy to start a github
> >> repo for same if there are any takers...
> >>
> >> Martin
> >
> >> This email message may contain legally privileged and/or confidential
> >> information.  If you are not the intended recipient(s), or the employee
> >> or agent responsible for the delivery of this message to the intended
> >> recipient(s), you are hereby notified that any disclosure, copying,
> >> distribution, or use of this email message is prohibited.  If you have
> >> received this message in error, please notify the sender immediately by
> >> e-mail and delete this email message from your computer. Thank you.
> >
> >
> >> On 06/29/2016 01:57 PM, Thomas Girke wrote:
> >>> Hi Vince and Martin,
> >>>
> >>> It was great seeing you at the Bioc conference, and thanks for all your
> >>> time organizing the conference. As always it was a great success with a
> >>> lot of inspiring presentations and discussions.
> >>>
> >>> In one of our discussions you ask me for feedback why I think handling
> >>> of meta-data is currently not straightforward for non-expert users of
> >>> Bioc packages such as biologists, data analysts or developers coming
> >>> from other languages.
> >>>
> >>> In my opinion, one main reason for this difficulty is that there is no
> >>> formal utility provided for importing meta-data from external files
> >>> (e.g. tabular, json or other formats). SummarizedExperiments has all
> >>> these great functionalities but it is not intuitive to non-expert users
> >>> how to import the data into the final object. For a developer it is
> easy
> >>> to write a custom import function but not to non-R programmers.
> >>> Addressing this need would be trivial by providing an import function
> >>> that could read meta-data (optionally along with assay/range data)
> >>> provided by the user directly into SummarizedExperiment objects (and/or
> >>> RangedSummarizedExperiment). To the best of my knowledge, a
> >>> readSummarizedExperiment is currently not available, but I might be
> wrong?
> >>>
> >>> Almost equally important would be an export function so that users can
> >>> easily report intermediate results and also share them with external
> >>> software outside of R. Clearly, for the latter need exporting to an Rd
> >>> file is not an option.
> >>>
> >>> Especially the import step overlaps substantially how we communicate
> >>> with experimentalists via spreadsheets, a topic we discussed at the
> >>> meeting quite a bit. Providing one or two best practice templates of
> how
> >>> to organize experiments in the 'spirit' of SummarizedExperiment could
> >>> help to educate scientists how to format their meta-data in Excel or
> >>> Google sheets so that they are easier to process. This would also
> >>> improve reproducibility since many sample handling errors happen right
> >>> at this level. As an example file one could use here the current
> colData
> >>> sample used by the SummarizedExperiment vignette.
> >>>
> >>> That's really all.
> >>>
> >>> Best,
> >>>
> >>> Thomas
> >>>
> >>
> >>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list