[Bioc-devel] Changes to the SummarizedExperiment Class

Hector Corrada Bravo hcorrada at gmail.com
Wed Mar 4 14:32:36 CET 2015


May I advocate for  'IndexedDataFrame' or 'IndexedFrame'? 'rowIndices' can
return whatever makes sense (GRanges, or other data structures -thinking
taxonomy for metagenomics for example-). GRangesFrame can inherit from this.

On Wed, Mar 4, 2015 at 3:28 AM, Hervé Pagès <hpages at fredhutch.org> wrote:

> GRangesFrame is an interesting idea and I gave it some thoughts.
>
> There is this nice symmetry between GRanges and GRangesFrame:
>
> - GRanges = a naked GRanges + a DataFrame accessible via mcols()
>
> - GRangesFrame = a DataFrame + a naked GRanges accessible via
>                  some accessor (e.g. rowRanges())
>
> So GRanges and GRangesFrame are equivalent in terms of what they
> can hold, but different in terms of API: the former has the ranges
> API as primary API and the DataFrame API on its mcols() component,
> and the latter has the DataFrame API as primary API and the ranges
> API on its rowRanges() component. Nice switch!
>
> What does this API switch bring us? A GRangesFrame object is now
> an object that fully behaves like a DataFrame and people can also
> perform range-based operations on its rowRanges() component.
> Here is what I'm afraid is going to happen: people will also want
> to be able to perform range-based operations *directly* on
> these objects, i.e. without having to call rowRanges() first.
> So for example when they do subsetByOverlaps(), subsetting
> happens vertically. Also the Hits object returned by findOverlaps()
> would contain row indices. Problem with this is that these objects
> now start to suffer from the "dual personality syndrome". For
> example, it's not clear anymore what their length should be.
> Strictly speaking it should be their number of columns (that's
> what the length of a DataFrame is), but the ranges API that
> we're trying to put on them also makes them feel like vectors
> along the vertical dimension so it also feels that their length
> should be their number of rows. Same thing with 1D subsetting.
> Why does it subset the columns and not the rows? Most people
> are now confused.
>
> It's interesting to note that the same thing happens with GRanges
> objects, but in the opposite direction: people wish they could
> do DataFrame operations directly on them without calling mcols()
> first. But in order to preserve the good health of GRanges objects,
> we've not done that (except for $, a shortcut for mcols(x)$,
> the pressure was just too strong).
>
> H.
>
>
>
> On 03/03/2015 04:35 PM, Michael Lawrence wrote:
>
>> Should be possible for the annotations to be of any type, as long as they
>> satisfy a simple contract of NROW() and 2D "[". Then, you could have a
>> DataFrame, GRanges, or whatever in there. But it would be nice to have a
>> special class for the container with range information. The contract for
>> the range annotation would be to have a granges() method.
>>
>> I agree it would be nice if there was a way with the methods package to
>> easily assert such contracts. For example, one could define an interface
>> with a set of generics (and optionally the relevant position in the
>> generic
>> signature). Then, once all of the methods have been assigned for a
>> particular class, it is made to inherit from that contract class. There
>> are
>> lots of gotchas though. Not sure how useful it would be in practice.
>>
>>
>> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty <haverty.peter at gene.com>
>> wrote:
>>
>>  There are some nice similarities in these new imaginary types.  A
>>> "GRangesFrame" is a list of dimensionally identical things (columns) and
>>> some row meta-data (the GRanges).  The SE-like object is similarly a list
>>> of dimensionally like things (matrices, RleDataFrames, BigMatrix objects,
>>> HDF5-backed things) with some row meta-data (a DataFrame or
>>> GRangesFrame).
>>> Elegant?  Maybe they would actually be relatives in the class tree.
>>>
>>> I wonder if this kind of thing would be easier if we had Java-style
>>> Interfaces or duck-typing.  The "x" slot of "y" holds something that
>>> implements this set of methods ...
>>>
>>> Oh, and kinda apropos, the genoset class will probably go away or become
>>> an extension to this new SE-like thing.  The extra stuff that comes along
>>> with genoset will still be available.
>>>
>>> Pete
>>>
>>> ____________________
>>> Peter M. Haverty, Ph.D.
>>> Genentech, Inc.
>>> phaverty at gene.com
>>>
>>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. <tim.triche at gmail.com>
>>> wrote:
>>>
>>>  This.
>>>>
>>>> It would be damned near perfect as a return value for assays coming out
>>>> of
>>>> an object that held several such assays at several time points in a
>>>> population, where there are both assay-wise and covariate-wise "holes"
>>>> that
>>>> could nonetheless be usefully imputed across assays.
>>>>
>>>>
>>>> Statistics is the grammar of science.
>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
>>>>
>>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty <haverty.peter at gene.com>
>>>> wrote:
>>>>
>>>>
>>>>>>
>>>>>>
>>>>>>   I still think GRanges should be a subclass of DataFrame,
>>>>>>
>>>>>>> which would make this easy, but I don't seem to be winning that
>>>>>>>
>>>>>> argument.
>>>>>
>>>>>>
>>>>>>>
>>>>>> Just impossible. As Michael mentioned back in November, they have
>>>>>> conflicting APIs.
>>>>>>
>>>>>
>>>>>
>>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a GRanges
>>>>> (without mcols) as an index?
>>>>>
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list