[Bioc-devel] SummarizedExperiment with alternate back end

Sat Sep 19 03:18:50 CEST 2015

While we are on the topic, my GenoSet class will become a subclass of
RangedSummarizedExperiment, rather than eSet, after this upcoming release.
For this release both APIs work (colnames and sampleNames, etc.)

I think the range-free SummarizedExperiment will be great. I've seen a lot
of ExpressionSets with random, non-exprs stuff in the exprs slot for lack
of something more appropriate.

Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com

On Fri, Sep 18, 2015 at 6:09 PM, Ryan <rct at thompsonclan.org> wrote:

> In the dev version, SummarizedExperiment has been split into
> RangedSummarizedExperiment (equivalent to the current
> SummarizedExperiement, with rowRanges) and SummarizedExperiment (kind of
> like eSet, no rowRanges). Given that eSet objects also support multiple
> assayData elements, I believe the new SummarizedExperiment is pretty close
> to being eSet with different method names. In fact, I wonder if eSet
> could/should be reimplemented as a subclass of the new SummarizedExperiment
> class.
>
>
> On 9/18/15 5:36 PM, Kasper Daniel Hansen wrote:
>
>> Interesting, thanks for the pointer.
>>
>> In light of the existing (and future) work on this, may I suggest an eSet
>> like class, but build using the technologies in SummarizedExperiment.  Ie.
>> a SummarizedExperiment without the rowRanges.  I would very much like this
>> for modern work using eSet like containers.  Not everything has ranges.
>>
>> Vince: I am not claiming that it is easy to work with; we have pains as
>> well.  But am I missing something or is the assay matrix only 2.3Gb?
>>
>> Best,
>> Kasper
>>
>> On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty <haverty.peter at gene.com>
>> wrote:
>>
>> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good
>>> tricks
>>> for reducing the size of your eSets and SummarizedExperiments.  Both
>>> object
>>> types can go into assayData or assays. In fact, that's what they were
>>> designed for.
>>>
>>> At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from
>>> Illumina SNP arrays.  We typically have ~6 such rectangular objects in
>>> one
>>> eSet.  With a mix of BigMatrix object for point estimates and
>>> RleDataFrames
>>> for segmented data, readRDS times are quite reasonable.
>>>
>>>
>>> Pete
>>>
>>> ____________________
>>> Peter M. Haverty, Ph.D.
>>> Genentech, Inc.
>>> phaverty at gene.com
>>>
>>> On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.triche at gmail.com>
>>> wrote:
>>>
>>> bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can
>>>>
>>> be
>>>
>>>> handy for this, as it works well as a backend, especially if you go
>>>> about
>>>> splitting by chromosome as for CNV segmentation, DMR finding, etc.
>>>>  It's
>>>> not as seamless as one might like, but it's the closest thing I've
>>>> found.
>>>>
>>>> SciDb tries to implement a similar API, but for a distributed version of
>>>> this where the data itself is in a columnar database and served on
>>>>
>>> demand.
>>>
>>>> I tried getting that up and running as a SummarizedExperiment backend,
>>>>
>>> but
>>>
>>>> did not succeed.  I have previously shoveled all of the TCGA 450k data
>>>>
>>> into
>>>
>>>> one 7,000+ column bigMatrix which serializes to about 14GB on disk.
>>>>
>>>> If you have any replicates in your 700+ samples, it's a good idea to
>>>> keep
>>>> their SNP calls in metadata(yourSE), although if you change names it
>>>>
>>> needs
>>>
>>>> to propagate into the dependent metadata.  This is why I started
>>>>
>>> monkeying
>>>
>>>> around with linkedExperiments where those mappings are enforced; it's
>>>> becoming more of an issue with the TARGET pediatric AML study, where
>>>>
>>> there
>>>
>>>> are numerous diagnosis-remission-relapse trios whose identity I wish to
>>>> verify periodically.  The SNPs on the 450k array are great for this
>>>> purpose, but minfi doesn't really have a slot for them per se, so live
>>>> in
>>>> metadata().
>>>>
>>>>
>>>> --t
>>>>
>>>> On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey <
>>>>
>>> stvjc at channing.harvard.edu
>>>
>>>> wrote:
>>>>
>>>> i am dealing with ~700 450k arrays
>>>>>
>>>>> they are derived from one study, so it makes sense to think of
>>>>>
>>>>> them holistically.
>>>>>
>>>>> both the load time and the memory consumption are not satisfactory.
>>>>>
>>>>> has anyone worked on an object type that implements the rangedSE API
>>>>>
>>>> but
>>>
>>>> has
>>>>>
>>>>> the assay data out of memory?
>>>>>
>>>>> unix.time(load("wbmse.rda"))
>>>>>>
>>>>>     user  system elapsed
>>>>>
>>>>>   30.131   2.396  61.036
>>>>>
>>>>> object.size(wbmse)
>>>>>>
>>>>> 124031032 bytes
>>>>>
>>>>> dim(wbmse)
>>>>>>
>>>>> [1] 485577    690
>>>>>
>>>>> object.size(assays(wbmse))
>>>>>>
>>>>> 2680430992 bytes
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>
>>
>
>

	[[alternative HTML version deleted]]