[Bioc-devel] Need for file-based handling of meta-data

Thomas Girke thomas.girke at ucr.edu
Wed Jun 29 21:42:52 CEST 2016

Yes, a "readSummarizedExperiment" would be a "modern-day analog of
Biobase::readExpressionSet". I also agree with the other suggestions
including github to get this started, and Vince's thoughts on binding
meta-data more tightly to source data as well as improving

As suggested I am sharing this discussion with the bioc-devel list.


On Wed, Jun 29, 2016 at 06:22:49PM +0000, Vincent Carey wrote:

> Thanks Thomas -- I think this should be circulated to biocore for further comments.  I am in agreement
> that we need to do a better job at both demonstrating the values of a) binding metadata to data, b)
> using standard containers through workflows, c) allowing interoperation.  I learned some useful things
> about spreadsheet interoperation at the conference and need to learn more.
> In a sense we are giving a specific implementation of some of the rules in
> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
> and I wonder whether we could come up with another topic for the "ten simple rules"
> series that addresses these concerns, or do something similar, perhaps for F1000Research,
> with a Bioconductor-interoperability focus on metadata.

> On Wed, Jun 29, 2016 at 06:28:49PM +0000, Martin Morgan wrote:

> I guess you mean a modern-day analog of Biobase::readExpressionSet ? I 
> like the idea of templates, and also drafting a 'Ten Steps Toward 
> Reproduciblity in R / Bioconductor'. Would be happy to start a github 
> repo for same if there are any takers...
> Martin

> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee
> or agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

> On 06/29/2016 01:57 PM, Thomas Girke wrote:
> > Hi Vince and Martin,
> >
> > It was great seeing you at the Bioc conference, and thanks for all your
> > time organizing the conference. As always it was a great success with a
> > lot of inspiring presentations and discussions.
> >
> > In one of our discussions you ask me for feedback why I think handling
> > of meta-data is currently not straightforward for non-expert users of
> > Bioc packages such as biologists, data analysts or developers coming
> > from other languages.
> >
> > In my opinion, one main reason for this difficulty is that there is no
> > formal utility provided for importing meta-data from external files
> > (e.g. tabular, json or other formats). SummarizedExperiments has all
> > these great functionalities but it is not intuitive to non-expert users
> > how to import the data into the final object. For a developer it is easy
> > to write a custom import function but not to non-R programmers.
> > Addressing this need would be trivial by providing an import function
> > that could read meta-data (optionally along with assay/range data)
> > provided by the user directly into SummarizedExperiment objects (and/or
> > RangedSummarizedExperiment). To the best of my knowledge, a
> > readSummarizedExperiment is currently not available, but I might be wrong?
> >
> > Almost equally important would be an export function so that users can
> > easily report intermediate results and also share them with external
> > software outside of R. Clearly, for the latter need exporting to an Rd
> > file is not an option.
> >
> > Especially the import step overlaps substantially how we communicate
> > with experimentalists via spreadsheets, a topic we discussed at the
> > meeting quite a bit. Providing one or two best practice templates of how
> > to organize experiments in the 'spirit' of SummarizedExperiment could
> > help to educate scientists how to format their meta-data in Excel or
> > Google sheets so that they are easier to process. This would also
> > improve reproducibility since many sample handling errors happen right
> > at this level. As an example file one could use here the current colData
> > sample used by the SummarizedExperiment vignette.
> >
> > That's really all. 
> >
> > Best,
> >
> > Thomas
> >

More information about the Bioc-devel mailing list