[Bioc-devel] Need for file-based handling of meta-data

Martin Morgan martin.morgan at roswellpark.org
Thu Jun 30 01:58:34 CEST 2016

On 06/29/2016 03:42 PM, Thomas Girke wrote:
> Yes, a "readSummarizedExperiment" would be a "modern-day analog of
> Biobase::readExpressionSet". I also agree with the other suggestions
> including github to get this started, and Vince's thoughts on binding
> meta-data more tightly to source data as well as improving
> interoperability.

I started a repository at


I envision this as a package / white paper / eventually publication. 
feel free to fork etc., and / or to contribute other ideas.


> As suggested I am sharing this discussion with the bioc-devel list.
> Thomas
> On Wed, Jun 29, 2016 at 06:22:49PM +0000, Vincent Carey wrote:
>> Thanks Thomas -- I think this should be circulated to biocore for further comments.  I am in agreement
>> that we need to do a better job at both demonstrating the values of a) binding metadata to data, b)
>> using standard containers through workflows, c) allowing interoperation.  I learned some useful things
>> about spreadsheet interoperation at the conference and need to learn more.
>> In a sense we are giving a specific implementation of some of the rules in
>> http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285
>> and I wonder whether we could come up with another topic for the "ten simple rules"
>> series that addresses these concerns, or do something similar, perhaps for F1000Research,
>> with a Bioconductor-interoperability focus on metadata.
>> On Wed, Jun 29, 2016 at 06:28:49PM +0000, Martin Morgan wrote:
>> I guess you mean a modern-day analog of Biobase::readExpressionSet ? I
>> like the idea of templates, and also drafting a 'Ten Steps Toward
>> Reproduciblity in R / Bioconductor'. Would be happy to start a github
>> repo for same if there are any takers...
>> Martin
>> This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the employee
>> or agent responsible for the delivery of this message to the intended
>> recipient(s), you are hereby notified that any disclosure, copying,
>> distribution, or use of this email message is prohibited.  If you have
>> received this message in error, please notify the sender immediately by
>> e-mail and delete this email message from your computer. Thank you.
>> On 06/29/2016 01:57 PM, Thomas Girke wrote:
>>> Hi Vince and Martin,
>>> It was great seeing you at the Bioc conference, and thanks for all your
>>> time organizing the conference. As always it was a great success with a
>>> lot of inspiring presentations and discussions.
>>> In one of our discussions you ask me for feedback why I think handling
>>> of meta-data is currently not straightforward for non-expert users of
>>> Bioc packages such as biologists, data analysts or developers coming
>>> from other languages.
>>> In my opinion, one main reason for this difficulty is that there is no
>>> formal utility provided for importing meta-data from external files
>>> (e.g. tabular, json or other formats). SummarizedExperiments has all
>>> these great functionalities but it is not intuitive to non-expert users
>>> how to import the data into the final object. For a developer it is easy
>>> to write a custom import function but not to non-R programmers.
>>> Addressing this need would be trivial by providing an import function
>>> that could read meta-data (optionally along with assay/range data)
>>> provided by the user directly into SummarizedExperiment objects (and/or
>>> RangedSummarizedExperiment). To the best of my knowledge, a
>>> readSummarizedExperiment is currently not available, but I might be wrong?
>>> Almost equally important would be an export function so that users can
>>> easily report intermediate results and also share them with external
>>> software outside of R. Clearly, for the latter need exporting to an Rd
>>> file is not an option.
>>> Especially the import step overlaps substantially how we communicate
>>> with experimentalists via spreadsheets, a topic we discussed at the
>>> meeting quite a bit. Providing one or two best practice templates of how
>>> to organize experiments in the 'spirit' of SummarizedExperiment could
>>> help to educate scientists how to format their meta-data in Excel or
>>> Google sheets so that they are easier to process. This would also
>>> improve reproducibility since many sample handling errors happen right
>>> at this level. As an example file one could use here the current colData
>>> sample used by the SummarizedExperiment vignette.
>>> That's really all.
>>> Best,
>>> Thomas

This email message may contain legally privileged and/or...{{dropped:2}}

More information about the Bioc-devel mailing list