[Bioc-devel] RangedSummarizedExperiment

Thu Jun 18 21:18:21 CEST 2015

Hi Tim,

On 06/18/2015 10:48 AM, Tim Triche, Jr. wrote:
> Hey since the refactoring is already breaking stuff willy nilly, can I make a few suggestions?
>
> 1) please for the love of all that is holy have backwards compatible methods for RSEs.  It's excruciating to have RSE as the target class, supporting RELEASE users with SE, and have to do endless duck typing nonsense or else have a bunch of new generics.

That sounds apocalyptic ;-) Will add the "exptData" method for
RangedSummarizedExperiment objects today (it issues a deprecation
warning though). Hope that will help reduce the "endless duck typing
nonsense".

Just to clarify though: generally speaking if you expect switching
between release and devel to be completely transparent then you're
putting your expectations too high.

>
> 2) please investigate some sort of "overlay" approach that would allow for accordion-style bundling/unbundling of transcripts, regions, compartments etc.  the reason for this will become ever more obvious, but now that I have students, I don't want to explain why everything seems to derive from a pre-ASE, pre-intronretention, pre-graphreference mindset of 20 years ago.  If you're going to break stuff, how about we break it real good and make things going forward flexible so as to eventually get it "right enough" (NOT "perfect" or "right" but "close enough for government work" and "close enough to work out of core")

You're going to have to give a lot more details about this. In
particular please explain what "accordion-style bundling/unbundling
of transcripts, regions, compartments" is, why you need it, and
how RSE would benefit from supporting that. FWIW note that you can
put whatever metadata columns on top of the rowRanges component of
an RSE so you can always do that to store whatever bundling/grouping
information you want. Alternatively you can extend the RSE class
to achieve the same goal. Keeping RSE as simple as possible and
agnostic about complex scenarios is actually a feature.

Also I'm not sure what the "pre-ASE, pre-intronretention,
pre-graphreference mindset" was 20 years ago but note that
SummarizedExperiment was designed and implemented less than
5 years ago and with the initial purpose of addressing the
needs of RNA-seq, ChIP-seq, and other NGS experiments.

>
> 3) I'll write patches ( you may not want to actually accept them, but I'll write 'em just the same ) when the urge moves me, but if some sort of a 30000' summary of the desired end product (along with deficiencies of the current SE) were readily available, it might help avoid a "second system effect".  The original SE was very good, fixing almost everything that sucked about the battle-tested ExpressionSet.  The new RSE has the great feature of automatically putting transcripts about where they belong if I point it at, say, a GEO GSE.  I assume that with things like Kallisto we can eventually do the same with arbitrary SRA experiments (there's a cute hack we are pushing in BaseSpace to make this happen already). But without a roadmap, it's tougher for people to see what needs to NOT be done, and that's really important.  What belongs in the base class, and what in a subclass?  This is not unimportant.

No detailed roadmap but the main goal of this refactoring is to
have a degraded version of the classic SummarizedExperiment (the
need for it was discussed on this list last year I think). We
thought this refactoring would also be a good time to migrate 
SummarizedExperiment to its own package. But we had not intention
to modify the existing functionalities of the classic SE (and
except for the replacement of exptData with metadata, RSE passes
the unit tests of the classic SE).

Like it's often the case with software development, we know pretty
well where we want to go but we didn't know exactly how we wanted
to get there. Hence not detailed roadmap. We thought it was not a
big deal anyway because our plan is to fix what we break so the
developers don't really need to worry about the gory details of
the changes. As I said earlier, things should be fixed and the
build report back to "normal" before the end of the week.

Thanks for your patience,
H.

>
> All of the above said, SE was great and RSE is already better in some respects. But with a clear roadmap and more input, I bet it (and a tight clean definition of what it is and isn't supposed to do) would be better-er.
>
> (Steps off soapbox)
>
> --t
>
>> On Jun 18, 2015, at 10:25 AM, Hervé Pagès <hpages at fredhutch.org> wrote:
>>
>> Hi Elena,
>>
>> Sorry for the inconvenience caused by the refactoring of
>> SummarizedExperiment objects.
>>
>>> On 06/18/2015 03:41 AM, Elena Grassi wrote:
>>> Hello,
>>>
>>> I'm writing as long as I am struggling a bit to keep the pace of
>>> RangedSummarizedExperiment in my package roar, whose main class
>>> contains RangedSummarizedExperiment to hold some of the data.
>>> Sometimes the developers fix issues for me but I would like to ease
>>> their work as much as possible but for example today I stumbled upon
>>> this:
>>> http://bioconductor.org/checkResults/devel/bioc-LATEST/roar/zin1-buildsrc.html
>>>
>>> that is related to the fact that I build a RoarDataset object without
>>> rowRanges, colData etc at the beginning of the analysis and I fill
>>> them later.
>>>
>>> My questions are:
>>> - apart from looking around the svn logs and source code to understand
>>> what's going on have I missed some mail here or other information
>>> about the roadmap for what will come for SummarizedExperiment?
>>
>> A new class, SummarizedExperiment0, was introduced for representing
>> "degraded" RangedSummarizedExperiment objects, that is, objects with
>> no rowRanges component.
>>
>> So RangedSummarizedExperiment now derives from SummarizedExperiment0,
>> which in turn derives from Vector. As a consequence of deriving from
>> Vector, these objects now have a length (length(x) = nrow(x)) and can
>> have names (names(x) = rownames(x)).
>>
>> Another consequence of these changes is that the internal representation
>> of RangedSummarizedExperiment objects has changed so serialized
>> instances need to be updated and re-serialized. Also packages that
>> define classes that extend RangedSummarizedExperiment (like roar)
>> might need some tweaks. I'm in the process of re-serializing
>> RangedSummarizedExperiment objects and fixing the packages affected
>> by these changes (should be done before the end of the week).
>>
>> Note that SummarizedExperiment0 might not be the definitive name for
>> this class but we can't use the "SummarizedExperiment" name for this
>> until the old SummarizedExperiment class defined in GenomicRanges is
>> gone (i.e. not before BioC 3.3).
>>
>>> - it would be better to avoid extending such a class and instead
>>> simply having another slot to avoid such initializations issues?
>>
>> Not sure I understand what you're asking exactly. Can you provide
>> more details?
>>
>> Thanks for your patience and sorry again for the inconvenience.
>>
>> H.
>>
>>>
>>> Thanks,
>>> E.
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319