[Bioc-devel] SummarizedExperiment with alternate back end

Vincent Carey stvjc at channing.harvard.edu
Sat Sep 19 00:37:15 CEST 2015


thanks to all, lots of potential here.

On Fri, Sep 18, 2015 at 3:28 PM, Peter Haverty <haverty.peter at gene.com>
wrote:

> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good
> tricks for reducing the size of your eSets and SummarizedExperiments.  Both
> object types can go into assayData or assays. In fact, that's what they
> were designed for.
>
> At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from
> Illumina SNP arrays.  We typically have ~6 such rectangular objects in one
> eSet.  With a mix of BigMatrix object for point estimates and RleDataFrames
> for segmented data, readRDS times are quite reasonable.
>
>
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phaverty at gene.com
>
> On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.triche at gmail.com>
> wrote:
>
>> bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can be
>> handy for this, as it works well as a backend, especially if you go about
>> splitting by chromosome as for CNV segmentation, DMR finding, etc.   It's
>> not as seamless as one might like, but it's the closest thing I've found.
>>
>> SciDb tries to implement a similar API, but for a distributed version of
>> this where the data itself is in a columnar database and served on demand.
>> I tried getting that up and running as a SummarizedExperiment backend, but
>> did not succeed.  I have previously shoveled all of the TCGA 450k data
>> into
>> one 7,000+ column bigMatrix which serializes to about 14GB on disk.
>>
>> If you have any replicates in your 700+ samples, it's a good idea to keep
>> their SNP calls in metadata(yourSE), although if you change names it needs
>> to propagate into the dependent metadata.  This is why I started monkeying
>> around with linkedExperiments where those mappings are enforced; it's
>> becoming more of an issue with the TARGET pediatric AML study, where there
>> are numerous diagnosis-remission-relapse trios whose identity I wish to
>> verify periodically.  The SNPs on the 450k array are great for this
>> purpose, but minfi doesn't really have a slot for them per se, so live in
>> metadata().
>>
>>
>> --t
>>
>> On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey <
>> stvjc at channing.harvard.edu>
>> wrote:
>>
>> > i am dealing with ~700 450k arrays
>> >
>> > they are derived from one study, so it makes sense to think of
>> >
>> > them holistically.
>> >
>> > both the load time and the memory consumption are not satisfactory.
>> >
>> > has anyone worked on an object type that implements the rangedSE API but
>> > has
>> >
>> > the assay data out of memory?
>> >
>> > > unix.time(load("wbmse.rda"))
>> >
>> >    user  system elapsed
>> >
>> >  30.131   2.396  61.036
>> >
>> > > object.size(wbmse)
>> >
>> > 124031032 bytes
>> >
>> > > dim(wbmse)
>> >
>> > [1] 485577    690
>> >
>> > > object.size(assays(wbmse))
>> >
>> > 2680430992 bytes
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioc-devel at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> >
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list