[Bioc-devel] SummarizedExperiment vs ExpressionSet

Michael Lawrence lawrence.michael at gene.com
Wed Nov 26 18:53:01 CET 2014


GRangesList is very compact, so this would definitely get the job done. But
having an empty range is not the same as a NA, nor does it mean that ranges
are "irrelevant". There are definitely times, especially as we extend
beyond genomics, when we need something more general, as Pete suggests.

As an aside I think there is an interesting structural relationship between
something like an eSet and a pivot table in a spreadsheet, except an eSet
has multiple measurement tables and the column/row annotations are not just
for aggregation. If we start to think more broadly, we should consider such
specializations and try to unify them into a single framework.



On Wed, Nov 26, 2014 at 9:37 AM, Tim Triche, Jr. <tim.triche at gmail.com>
wrote:

> so as a simple experiment, I did the following:
>
> library(GenomicRanges)
> bar <- matrix(rnorm(100), ncol=10)
> colnames(bar) <- as.character(1:10)
> rownames(bar) <- letters[1:10]
> foo <- SummarizedExperiment(assays=list(bar=bar))
>
> rowData(foo)
> ## GRangesList object of length 10:
> ## $a
> ## GRanges object with 0 ranges and 0 metadata columns:
> ##    seqnames    ranges strand
> ##       <Rle> <IRanges>  <Rle>
> ##
> ## $b
> ## GRanges object with 0 ranges and 0 metadata columns:
> ##      seqnames ranges strand
> ##
> ## $c
> ## GRanges object with 0 ranges and 0 metadata columns:
> ##      seqnames ranges strand
> ##
> ## ...
> ## <7 more elements>
>
> colData(foo)
> ## DataFrame with 10 rows and 0 columns
>
> This got me to thinking, why not have an emptyRanges class, or else the
> ability to index a bunch of NULL ranges without a lot of hoohah?  The
> defaults mostly do what they're supposed to; why not have a compact
> representation of empty rowData as for empty colData (i.e., a DataFrame
> with 0 rows)?  Or is a GRangesList of empty GRanges as compact as it is
> practicable to get for this purpose?
>
> Just pondering what the lowest-impact solution to the problem at hand might
> be.
>
>
> Statistics is the grammar of science.
> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
>
> On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com>
> wrote:
>
> > Hi all,
> >
> > I believe there is a strong need for an object that organizes a
> collection
> > of rectangular data (matrices, etc.) with metadata on the rows and
> > columns.  Can SummarizedExperiment inherit from something simpler that
> has
> > a DataFrame as rowData?  (I believe GenomicRanges should inherit from
> > DataTable, rather than Vector, and subset as x[i,j], but maybe that's
> > getting a bit off topic.)  I often see people stuffing arbitrary data
> into
> > an ExpressionSet and calling one of the assays "exprs" as a work-around.
> >
> > Regards,
> >
> > Pete
> >
> > ____________________
> > Peter M. Haverty, Ph.D.
> > Genentech, Inc.
> > phaverty at gene.com
> >
> > On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
> >
> > >
> > > On 26 November 2014 14:59, Wolfgang Huber wrote:
> > >
> > > > A colleague and I are designing a package for quantitative proteomics
> > > > data, and we are debating whether to base it on the
> > > > SummarizedExperiment or the ExpressionSet class.
> > > >
> > > > There is no immediate use for the ranges aspect of
> > > > SummarizedExperiment, so that would have to be carried around with
> > > > NAs, and this is a parsimony argument for using ExpressionSet
> > > > instead. OTOH, the interface of SummarizedExperiment is cleaner, its
> > > > code more modern and more likely to be updated, and users of the
> > > > Bioconductor project are likely to benefit from having to deal with a
> > > > single interface that works the same or similarly across packages,
> > > > rather than a variety of formats; which argues that new packages
> > > > should converge towards SummarizedExperiment('s interface).
> > > >
> > > > Are there any pertinent insights from this group?
> > >
> > > Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
> > > essentially an ExpressionSet for quantitative proteomics (i.e it has a
> > > MIAPE slot, instead of MIAME for example).
> > >
> > > Ideally, a SummarizedExperiment for proteomics would use
> peptide/protein
> > > ranges, which is in the pipeline, as far as I am concerned. When that
> > > becomes available, there should be infrastructure to coerce and MSnSet
> > > (and/or other relevant data) into an SummarizedExperiment.
> > >
> > > Hope this helps.
> > >
> > > Best wishes,
> > >
> > > Laurent
> > >
> > > > Thanks and best wishes
> > > > Wolfgang
> > > >
> > > > _______________________________________________
> > > > Bioc-devel at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> > > --
> > > Laurent Gatto
> > > http://cpu.sysbiol.cam.ac.uk/
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list