[Bioc-devel] Changes to the SummarizedExperiment Class

Valerie Obenchain vobencha at fredhutch.org
Fri Mar 6 16:59:30 CET 2015


Hi Mike,

Our error - we didn't bump GenomicRanges when rowRanges was added. 
Hopefully 1.19.43 will propagate today and things will be sorted out.

Val


On 03/06/2015 07:40 AM, Michael Love wrote:
> hi all,
>
> just a practical issue: I have GenomicRanges version 1.19.42 on my
> computer which does not have rowRanges defined, although the 1.19.42
> version on the Bioc website does have rowRanges in the man page:
>
> http://master.bioconductor.org/packages/3.1/bioc/html/GenomicRanges.html
>
> So I pass check locally but not in the devel branch on Bioc servers.
>
>> library(GenomicRanges)
>> rowRanges
> Error: object 'rowRanges' not found
>> sessionInfo()
> R Under development (unstable) (2014-12-08 r67137)
> Platform: x86_64-apple-darwin12.5.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices datasets  utils
>     methods   base
>
> other attached packages:
> [1] GenomicRanges_1.19.42 GenomeInfoDb_1.3.13   IRanges_2.1.41
> S4Vectors_0.5.21
> [5] BiocGenerics_0.13.6   RUnit_0.4.28          devtools_1.7.0        knitr_1.9
> [9] BiocInstaller_1.17.5
>
>
>
> On Wed, Mar 4, 2015 at 3:03 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>>
>> On 03/04/2015 10:03 AM, Peter Haverty wrote:
>>>
>>> Michael has a good point. The complexity of the BioC universe of classes
>>> hurts our ability to attract new users. More classes would be a minus there
>>> ... but a small set of common, explicit APIs would simplify things.
>>> Rectangular things implement the matrix Interface.  :-) Deprecating old
>>> stuff, like eSet, might help more than it hurts, on the simplicity front.
>>>
>>> P.S. apropos of understanding this universe of classes, I *love* the
>>> methods(class=x) thing Vincent mentioned.
>>
>>
>> The current version, under R-devel, is at
>>
>>    devtools::source_gist("https://gist.github.com/mtmorgan/9f98871adb9f0c1891a4")
>>
>>    > methods(class="SummarizedExperiment")
>>     [1] [                 [[                [[<-              [<-
>>     [5] $                 $<-               assay             assay<-
>>     [9] assayNames        assayNames<-      assays            assays<-
>>    [13] cbind             coerce            colData           colData<-
>>    [17] compare           Compare           countOverlaps     coverage
>>    [21] dim               dimnames          dimnames<-        disjointBins
>>    [25] distance          distanceToNearest duplicated        elementMetadata
>>    [29] elementMetadata<- end               end<-             exptData
>>    [33] exptData<-        extractROWS       findOverlaps      flank
>>    [37] follow            granges           isDisjoint        mcols
>>    [41] mcols<-           narrow            nearest           order
>>    [45] overlapsAny       precede           ranges            ranges<-
>>    [49] rank              rbind             replaceROWS       resize
>>    [53] restrict          rowData           rowData<-         seqinfo
>>    [57] seqinfo<-         seqnames          shift             show
>>    [61] sort              split             start             start<-
>>    [65] strand            strand<-          subset            subsetByOverlaps
>>    [69] updateObject      values            values<-          width
>>    [73] width<-
>>
>>    see ?"methods" for accessing help and source code
>>
>> and
>>
>>> head(attr(methods(class="SummarizedExperiment"), "info"))
>>                                                               generic visible
>> [,SummarizedExperiment,ANY-method                                  [    TRUE
>> [[,SummarizedExperiment,ANY,missing-method                        [[    TRUE
>> [[<-,SummarizedExperiment,ANY,missing-method                    [[<-    TRUE
>> [<-,SummarizedExperiment,ANY,ANY,SummarizedExperiment-method     [<-    TRUE
>> $,SummarizedExperiment-method                                      $    TRUE
>> $<-,SummarizedExperiment-method                                  $<-    TRUE
>>                                                               isS4          from
>> [,SummarizedExperiment,ANY-method                            TRUE GenomicRanges
>> [[,SummarizedExperiment,ANY,missing-method                   TRUE GenomicRanges
>> [[<-,SummarizedExperiment,ANY,missing-method                 TRUE GenomicRanges
>> [<-,SummarizedExperiment,ANY,ANY,SummarizedExperiment-method TRUE GenomicRanges
>> $,SummarizedExperiment-method                                TRUE GenomicRanges
>> $<-,SummarizedExperiment-method                              TRUE GenomicRanges
>>
>> Martin
>>
>>>
>>> Pete
>>>
>>> ____________________
>>> Peter M. Haverty, Ph.D.
>>> Genentech, Inc.
>>> phaverty at gene.com
>>>
>>> On Wed, Mar 4, 2015 at 9:38 AM, Michael Lawrence <lawrence.michael at gene.com>
>>> wrote:
>>>
>>>> I think we need to make sure that there are enough benefits of something
>>>> like GRangesFrame before we introduce yet another complicated and
>>>> overlapping data structure into the framework. Prior to summarization, the
>>>> ranges seem primary, after summarization, it may often make sense for them
>>>> to be secondary. But I'm just not sure what we gain from a new data
>>>> structure.
>>>>
>>>> On Wed, Mar 4, 2015 at 12:28 AM, Herv� Pag�s <hpages at fredhutch.org> wrote:
>>>>
>>>>> GRangesFrame is an interesting idea and I gave it some thoughts.
>>>>>
>>>>> There is this nice symmetry between GRanges and GRangesFrame:
>>>>>
>>>>> - GRanges = a naked GRanges + a DataFrame accessible via mcols()
>>>>>
>>>>> - GRangesFrame = a DataFrame + a naked GRanges accessible via
>>>>>                    some accessor (e.g. rowRanges())
>>>>>
>>>>> So GRanges and GRangesFrame are equivalent in terms of what they
>>>>> can hold, but different in terms of API: the former has the ranges
>>>>> API as primary API and the DataFrame API on its mcols() component,
>>>>> and the latter has the DataFrame API as primary API and the ranges
>>>>> API on its rowRanges() component. Nice switch!
>>>>>
>>>>> What does this API switch bring us? A GRangesFrame object is now
>>>>> an object that fully behaves like a DataFrame and people can also
>>>>> perform range-based operations on its rowRanges() component.
>>>>> Here is what I'm afraid is going to happen: people will also want
>>>>> to be able to perform range-based operations *directly* on
>>>>> these objects, i.e. without having to call rowRanges() first.
>>>>> So for example when they do subsetByOverlaps(), subsetting
>>>>> happens vertically. Also the Hits object returned by findOverlaps()
>>>>> would contain row indices. Problem with this is that these objects
>>>>> now start to suffer from the "dual personality syndrome". For
>>>>> example, it's not clear anymore what their length should be.
>>>>> Strictly speaking it should be their number of columns (that's
>>>>> what the length of a DataFrame is), but the ranges API that
>>>>> we're trying to put on them also makes them feel like vectors
>>>>> along the vertical dimension so it also feels that their length
>>>>> should be their number of rows. Same thing with 1D subsetting.
>>>>> Why does it subset the columns and not the rows? Most people
>>>>> are now confused.
>>>>>
>>>>> It's interesting to note that the same thing happens with GRanges
>>>>> objects, but in the opposite direction: people wish they could
>>>>> do DataFrame operations directly on them without calling mcols()
>>>>> first. But in order to preserve the good health of GRanges objects,
>>>>> we've not done that (except for $, a shortcut for mcols(x)$,
>>>>> the pressure was just too strong).
>>>>>
>>>>> H.
>>>>>
>>>>>
>>>>>
>>>>> On 03/03/2015 04:35 PM, Michael Lawrence wrote:
>>>>>
>>>>>> Should be possible for the annotations to be of any type, as long as they
>>>>>> satisfy a simple contract of NROW() and 2D "[". Then, you could have a
>>>>>> DataFrame, GRanges, or whatever in there. But it would be nice to have a
>>>>>> special class for the container with range information. The contract for
>>>>>> the range annotation would be to have a granges() method.
>>>>>>
>>>>>> I agree it would be nice if there was a way with the methods package to
>>>>>> easily assert such contracts. For example, one could define an interface
>>>>>> with a set of generics (and optionally the relevant position in the
>>>>>> generic
>>>>>> signature). Then, once all of the methods have been assigned for a
>>>>>> particular class, it is made to inherit from that contract class. There
>>>>>> are
>>>>>> lots of gotchas though. Not sure how useful it would be in practice.
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty <haverty.peter at gene.com>
>>>>>> wrote:
>>>>>>
>>>>>>    There are some nice similarities in these new imaginary types.  A
>>>>>>>
>>>>>>> "GRangesFrame" is a list of dimensionally identical things (columns) and
>>>>>>> some row meta-data (the GRanges).  The SE-like object is similarly a
>>>>>>> list
>>>>>>> of dimensionally like things (matrices, RleDataFrames, BigMatrix
>>>>>>> objects,
>>>>>>> HDF5-backed things) with some row meta-data (a DataFrame or
>>>>>>> GRangesFrame).
>>>>>>> Elegant?  Maybe they would actually be relatives in the class tree.
>>>>>>>
>>>>>>> I wonder if this kind of thing would be easier if we had Java-style
>>>>>>> Interfaces or duck-typing.  The "x" slot of "y" holds something that
>>>>>>> implements this set of methods ...
>>>>>>>
>>>>>>> Oh, and kinda apropos, the genoset class will probably go away or become
>>>>>>> an extension to this new SE-like thing.  The extra stuff that comes
>>>>>>> along
>>>>>>> with genoset will still be available.
>>>>>>>
>>>>>>> Pete
>>>>>>>
>>>>>>> ____________________
>>>>>>> Peter M. Haverty, Ph.D.
>>>>>>> Genentech, Inc.
>>>>>>> phaverty at gene.com
>>>>>>>
>>>>>>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. <tim.triche at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>    This.
>>>>>>>>
>>>>>>>>
>>>>>>>> It would be damned near perfect as a return value for assays coming
>>>>>>>> out of
>>>>>>>> an object that held several such assays at several time points in a
>>>>>>>> population, where there are both assay-wise and covariate-wise "holes"
>>>>>>>> that
>>>>>>>> could nonetheless be usefully imputed across assays.
>>>>>>>>
>>>>>>>>
>>>>>>>> Statistics is the grammar of science.
>>>>>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
>>>>>>>>
>>>>>>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty <haverty.peter at gene.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>     I still think GRanges should be a subclass of DataFrame,
>>>>>>>>>>
>>>>>>>>>>> which would make this easy, but I don't seem to be winning that
>>>>>>>>>>>
>>>>>>>>>> argument.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Just impossible. As Michael mentioned back in November, they have
>>>>>>>>>> conflicting APIs.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a GRanges
>>>>>>>>> (without mcols) as an index?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>            [[alternative HTML version deleted]]
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>            [[alternative HTML version deleted]]
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>           [[alternative HTML version deleted]]
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>
>>>>>>
>>>>> --
>>>>> Herv� Pag�s
>>>>>
>>>>> Program in Computational Biology
>>>>> Division of Public Health Sciences
>>>>> Fred Hutchinson Cancer Research Center
>>>>> 1100 Fairview Ave. N, M1-B514
>>>>> P.O. Box 19024
>>>>> Seattle, WA 98109-1024
>>>>>
>>>>> E-mail: hpages at fredhutch.org
>>>>> Phone:  (206) 667-5791
>>>>> Fax:    (206) 667-1319
>>>>>
>>>>
>>>>
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fredhutch.org
Phone: (206) 667-3158



More information about the Bioc-devel mailing list