[Bioc-devel] SummarizedExperiment with alternate back end

Ryan rct at thompsonclan.org
Sat Sep 19 03:09:42 CEST 2015


In the dev version, SummarizedExperiment has been split into 
RangedSummarizedExperiment (equivalent to the current 
SummarizedExperiement, with rowRanges) and SummarizedExperiment (kind of 
like eSet, no rowRanges). Given that eSet objects also support multiple 
assayData elements, I believe the new SummarizedExperiment is pretty 
close to being eSet with different method names. In fact, I wonder if 
eSet could/should be reimplemented as a subclass of the new 
SummarizedExperiment class.

On 9/18/15 5:36 PM, Kasper Daniel Hansen wrote:
> Interesting, thanks for the pointer.
>
> In light of the existing (and future) work on this, may I suggest an eSet
> like class, but build using the technologies in SummarizedExperiment.  Ie.
> a SummarizedExperiment without the rowRanges.  I would very much like this
> for modern work using eSet like containers.  Not everything has ranges.
>
> Vince: I am not claiming that it is easy to work with; we have pains as
> well.  But am I missing something or is the assay matrix only 2.3Gb?
>
> Best,
> Kasper
>
> On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty <haverty.peter at gene.com>
> wrote:
>
>> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good tricks
>> for reducing the size of your eSets and SummarizedExperiments.  Both object
>> types can go into assayData or assays. In fact, that's what they were
>> designed for.
>>
>> At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from
>> Illumina SNP arrays.  We typically have ~6 such rectangular objects in one
>> eSet.  With a mix of BigMatrix object for point estimates and RleDataFrames
>> for segmented data, readRDS times are quite reasonable.
>>
>>
>> Pete
>>
>> ____________________
>> Peter M. Haverty, Ph.D.
>> Genentech, Inc.
>> phaverty at gene.com
>>
>> On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.triche at gmail.com>
>> wrote:
>>
>>> bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can
>> be
>>> handy for this, as it works well as a backend, especially if you go about
>>> splitting by chromosome as for CNV segmentation, DMR finding, etc.   It's
>>> not as seamless as one might like, but it's the closest thing I've found.
>>>
>>> SciDb tries to implement a similar API, but for a distributed version of
>>> this where the data itself is in a columnar database and served on
>> demand.
>>> I tried getting that up and running as a SummarizedExperiment backend,
>> but
>>> did not succeed.  I have previously shoveled all of the TCGA 450k data
>> into
>>> one 7,000+ column bigMatrix which serializes to about 14GB on disk.
>>>
>>> If you have any replicates in your 700+ samples, it's a good idea to keep
>>> their SNP calls in metadata(yourSE), although if you change names it
>> needs
>>> to propagate into the dependent metadata.  This is why I started
>> monkeying
>>> around with linkedExperiments where those mappings are enforced; it's
>>> becoming more of an issue with the TARGET pediatric AML study, where
>> there
>>> are numerous diagnosis-remission-relapse trios whose identity I wish to
>>> verify periodically.  The SNPs on the 450k array are great for this
>>> purpose, but minfi doesn't really have a slot for them per se, so live in
>>> metadata().
>>>
>>>
>>> --t
>>>
>>> On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey <
>> stvjc at channing.harvard.edu
>>> wrote:
>>>
>>>> i am dealing with ~700 450k arrays
>>>>
>>>> they are derived from one study, so it makes sense to think of
>>>>
>>>> them holistically.
>>>>
>>>> both the load time and the memory consumption are not satisfactory.
>>>>
>>>> has anyone worked on an object type that implements the rangedSE API
>> but
>>>> has
>>>>
>>>> the assay data out of memory?
>>>>
>>>>> unix.time(load("wbmse.rda"))
>>>>     user  system elapsed
>>>>
>>>>   30.131   2.396  61.036
>>>>
>>>>> object.size(wbmse)
>>>> 124031032 bytes
>>>>
>>>>> dim(wbmse)
>>>> [1] 485577    690
>>>>
>>>>> object.size(assays(wbmse))
>>>> 2680430992 bytes
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>



More information about the Bioc-devel mailing list