[Bioc-devel] Changes to the SummarizedExperiment Class
Michael Love
michaelisaiahlove at gmail.com
Fri Mar 6 16:40:22 CET 2015
hi all,
just a practical issue: I have GenomicRanges version 1.19.42 on my
computer which does not have rowRanges defined, although the 1.19.42
version on the Bioc website does have rowRanges in the man page:
So I pass check locally but not in the devel branch on Bioc servers.
> library(GenomicRanges)
> rowRanges
Error: object 'rowRanges' not found
> sessionInfo()
R Under development (unstable) (2014-12-08 r67137)
Platform: x86_64-apple-darwin12.5.0 (64-bit)
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices datasets utils
methods base
other attached packages:
[1] GenomicRanges_1.19.42 GenomeInfoDb_1.3.13 IRanges_2.1.41
[5] BiocGenerics_0.13.6 RUnit_0.4.28 devtools_1.7.0 knitr_1.9
[9] BiocInstaller_1.17.5
On Wed, Mar 4, 2015 at 3:03 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
> On 03/04/2015 10:03 AM, Peter Haverty wrote:
>> Michael has a good point. The complexity of the BioC universe of classes
>> hurts our ability to attract new users. More classes would be a minus there
>> ... but a small set of common, explicit APIs would simplify things.
>> Rectangular things implement the matrix Interface. :-) Deprecating old
>> stuff, like eSet, might help more than it hurts, on the simplicity front.
>> P.S. apropos of understanding this universe of classes, I *love* the
>> methods(class=x) thing Vincent mentioned.
> The current version, under R-devel, is at
> devtools::source_gist("https://gist.github.com/mtmorgan/9f98871adb9f0c1891a4")
> > methods(class="SummarizedExperiment")
> [1] [ [[ [[<- [<-
> [5] $ $<- assay assay<-
> [9] assayNames assayNames<- assays assays<-
> [13] cbind coerce colData colData<-
> [17] compare Compare countOverlaps coverage
> [21] dim dimnames dimnames<- disjointBins
> [25] distance distanceToNearest duplicated elementMetadata
> [29] elementMetadata<- end end<- exptData
> [33] exptData<- extractROWS findOverlaps flank
> [37] follow granges isDisjoint mcols
> [41] mcols<- narrow nearest order
> [45] overlapsAny precede ranges ranges<-
> [49] rank rbind replaceROWS resize
> [53] restrict rowData rowData<- seqinfo
> [57] seqinfo<- seqnames shift show
> [61] sort split start start<-
> [65] strand strand<- subset subsetByOverlaps
> [69] updateObject values values<- width
> [73] width<-
> see ?"methods" for accessing help and source code
> and
> > head(attr(methods(class="SummarizedExperiment"), "info"))
> generic visible
> [,SummarizedExperiment,ANY-method [ TRUE
> [[,SummarizedExperiment,ANY,missing-method [[ TRUE
> [[<-,SummarizedExperiment,ANY,missing-method [[<- TRUE
> [<-,SummarizedExperiment,ANY,ANY,SummarizedExperiment-method [<- TRUE
> $,SummarizedExperiment-method $ TRUE
> $<-,SummarizedExperiment-method $<- TRUE
> isS4 from
> [,SummarizedExperiment,ANY-method TRUE GenomicRanges
> [[,SummarizedExperiment,ANY,missing-method TRUE GenomicRanges
> [[<-,SummarizedExperiment,ANY,missing-method TRUE GenomicRanges
> [<-,SummarizedExperiment,ANY,ANY,SummarizedExperiment-method TRUE GenomicRanges
> $,SummarizedExperiment-method TRUE GenomicRanges
> $<-,SummarizedExperiment-method TRUE GenomicRanges
> Martin
>> Pete
>> ____________________
>> Peter M. Haverty, Ph.D.
>> Genentech, Inc.
>> phaverty at gene.com
>> On Wed, Mar 4, 2015 at 9:38 AM, Michael Lawrence <lawrence.michael at gene.com>
>> wrote:
>>> I think we need to make sure that there are enough benefits of something
>>> like GRangesFrame before we introduce yet another complicated and
>>> overlapping data structure into the framework. Prior to summarization, the
>>> ranges seem primary, after summarization, it may often make sense for them
>>> to be secondary. But I'm just not sure what we gain from a new data
>>> structure.
>>> On Wed, Mar 4, 2015 at 12:28 AM, Herv� Pag�s <hpages at fredhutch.org> wrote:
>>>> GRangesFrame is an interesting idea and I gave it some thoughts.
>>>> There is this nice symmetry between GRanges and GRangesFrame:
>>>> - GRanges = a naked GRanges + a DataFrame accessible via mcols()
>>>> - GRangesFrame = a DataFrame + a naked GRanges accessible via
>>>> some accessor (e.g. rowRanges())
>>>> So GRanges and GRangesFrame are equivalent in terms of what they
>>>> can hold, but different in terms of API: the former has the ranges
>>>> API as primary API and the DataFrame API on its mcols() component,
>>>> and the latter has the DataFrame API as primary API and the ranges
>>>> API on its rowRanges() component. Nice switch!
>>>> What does this API switch bring us? A GRangesFrame object is now
>>>> an object that fully behaves like a DataFrame and people can also
>>>> perform range-based operations on its rowRanges() component.
>>>> Here is what I'm afraid is going to happen: people will also want
>>>> to be able to perform range-based operations *directly* on
>>>> these objects, i.e. without having to call rowRanges() first.
>>>> So for example when they do subsetByOverlaps(), subsetting
>>>> happens vertically. Also the Hits object returned by findOverlaps()
>>>> would contain row indices. Problem with this is that these objects
>>>> now start to suffer from the "dual personality syndrome". For
>>>> example, it's not clear anymore what their length should be.
>>>> Strictly speaking it should be their number of columns (that's
>>>> what the length of a DataFrame is), but the ranges API that
>>>> we're trying to put on them also makes them feel like vectors
>>>> along the vertical dimension so it also feels that their length
>>>> should be their number of rows. Same thing with 1D subsetting.
>>>> Why does it subset the columns and not the rows? Most people
>>>> are now confused.
>>>> It's interesting to note that the same thing happens with GRanges
>>>> objects, but in the opposite direction: people wish they could
>>>> do DataFrame operations directly on them without calling mcols()
>>>> first. But in order to preserve the good health of GRanges objects,
>>>> we've not done that (except for $, a shortcut for mcols(x)$,
>>>> the pressure was just too strong).
>>>> H.
>>>> On 03/03/2015 04:35 PM, Michael Lawrence wrote:
>>>>> Should be possible for the annotations to be of any type, as long as they
>>>>> satisfy a simple contract of NROW() and 2D "[". Then, you could have a
>>>>> DataFrame, GRanges, or whatever in there. But it would be nice to have a
>>>>> special class for the container with range information. The contract for
>>>>> the range annotation would be to have a granges() method.
>>>>> I agree it would be nice if there was a way with the methods package to
>>>>> easily assert such contracts. For example, one could define an interface
>>>>> with a set of generics (and optionally the relevant position in the
>>>>> generic
>>>>> signature). Then, once all of the methods have been assigned for a
>>>>> particular class, it is made to inherit from that contract class. There
>>>>> are
>>>>> lots of gotchas though. Not sure how useful it would be in practice.
>>>>> On Tue, Mar 3, 2015 at 4:07 PM, Peter Haverty <haverty.peter at gene.com>
>>>>> wrote:
>>>>> There are some nice similarities in these new imaginary types. A
>>>>>> "GRangesFrame" is a list of dimensionally identical things (columns) and
>>>>>> some row meta-data (the GRanges). The SE-like object is similarly a
>>>>>> list
>>>>>> of dimensionally like things (matrices, RleDataFrames, BigMatrix
>>>>>> objects,
>>>>>> HDF5-backed things) with some row meta-data (a DataFrame or
>>>>>> GRangesFrame).
>>>>>> Elegant? Maybe they would actually be relatives in the class tree.
>>>>>> I wonder if this kind of thing would be easier if we had Java-style
>>>>>> Interfaces or duck-typing. The "x" slot of "y" holds something that
>>>>>> implements this set of methods ...
>>>>>> Oh, and kinda apropos, the genoset class will probably go away or become
>>>>>> an extension to this new SE-like thing. The extra stuff that comes
>>>>>> along
>>>>>> with genoset will still be available.
>>>>>> Pete
>>>>>> ____________________
>>>>>> Peter M. Haverty, Ph.D.
>>>>>> Genentech, Inc.
>>>>>> phaverty at gene.com
>>>>>> On Tue, Mar 3, 2015 at 3:42 PM, Tim Triche, Jr. <tim.triche at gmail.com>
>>>>>> wrote:
>>>>>> This.
>>>>>>> It would be damned near perfect as a return value for assays coming
>>>>>>> out of
>>>>>>> an object that held several such assays at several time points in a
>>>>>>> population, where there are both assay-wise and covariate-wise "holes"
>>>>>>> that
>>>>>>> could nonetheless be usefully imputed across assays.
>>>>>>> Statistics is the grammar of science.
>>>>>>> Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
>>>>>>> On Tue, Mar 3, 2015 at 3:25 PM, Peter Haverty <haverty.peter at gene.com>
>>>>>>> wrote:
>>>>>>>>> I still think GRanges should be a subclass of DataFrame,
>>>>>>>>>> which would make this easy, but I don't seem to be winning that
>>>>>>>>> argument.
>>>>>>>>> Just impossible. As Michael mentioned back in November, they have
>>>>>>>>> conflicting APIs.
>>>>>>>> Maybe a new "GRangesFrame" that is a DataFrame and holds a GRanges
>>>>>>>> (without mcols) as an index?
>>>>>>>> [[alternative HTML version deleted]]
>>>>>>>> _______________________________________________
>>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>>> [[alternative HTML version deleted]]
>>>>>>> _______________________________________________
>>>>>>> Bioc-devel at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>> [[alternative HTML version deleted]]
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>> --
>>>> Herv� Pag�s
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>> E-mail: hpages at fredhutch.org
>>>> Phone: (206) 667-5791
>>>> Fax: (206) 667-1319
>> [[alternative HTML version deleted]]
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> --
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list