[Bioc-devel] SummarizedExperiment with alternate back end

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Sat Sep 19 02:36:40 CEST 2015


Interesting, thanks for the pointer.

In light of the existing (and future) work on this, may I suggest an eSet
like class, but build using the technologies in SummarizedExperiment.  Ie.
a SummarizedExperiment without the rowRanges.  I would very much like this
for modern work using eSet like containers.  Not everything has ranges.

Vince: I am not claiming that it is easy to work with; we have pains as
well.  But am I missing something or is the assay matrix only 2.3Gb?

Best,
Kasper

On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty <haverty.peter at gene.com>
wrote:

> Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good tricks
> for reducing the size of your eSets and SummarizedExperiments.  Both object
> types can go into assayData or assays. In fact, that's what they were
> designed for.
>
> At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from
> Illumina SNP arrays.  We typically have ~6 such rectangular objects in one
> eSet.  With a mix of BigMatrix object for point estimates and RleDataFrames
> for segmented data, readRDS times are quite reasonable.
>
>
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> phaverty at gene.com
>
> On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.triche at gmail.com>
> wrote:
>
> > bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can
> be
> > handy for this, as it works well as a backend, especially if you go about
> > splitting by chromosome as for CNV segmentation, DMR finding, etc.   It's
> > not as seamless as one might like, but it's the closest thing I've found.
> >
> > SciDb tries to implement a similar API, but for a distributed version of
> > this where the data itself is in a columnar database and served on
> demand.
> > I tried getting that up and running as a SummarizedExperiment backend,
> but
> > did not succeed.  I have previously shoveled all of the TCGA 450k data
> into
> > one 7,000+ column bigMatrix which serializes to about 14GB on disk.
> >
> > If you have any replicates in your 700+ samples, it's a good idea to keep
> > their SNP calls in metadata(yourSE), although if you change names it
> needs
> > to propagate into the dependent metadata.  This is why I started
> monkeying
> > around with linkedExperiments where those mappings are enforced; it's
> > becoming more of an issue with the TARGET pediatric AML study, where
> there
> > are numerous diagnosis-remission-relapse trios whose identity I wish to
> > verify periodically.  The SNPs on the 450k array are great for this
> > purpose, but minfi doesn't really have a slot for them per se, so live in
> > metadata().
> >
> >
> > --t
> >
> > On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey <
> stvjc at channing.harvard.edu
> > >
> > wrote:
> >
> > > i am dealing with ~700 450k arrays
> > >
> > > they are derived from one study, so it makes sense to think of
> > >
> > > them holistically.
> > >
> > > both the load time and the memory consumption are not satisfactory.
> > >
> > > has anyone worked on an object type that implements the rangedSE API
> but
> > > has
> > >
> > > the assay data out of memory?
> > >
> > > > unix.time(load("wbmse.rda"))
> > >
> > >    user  system elapsed
> > >
> > >  30.131   2.396  61.036
> > >
> > > > object.size(wbmse)
> > >
> > > 124031032 bytes
> > >
> > > > dim(wbmse)
> > >
> > > [1] 485577    690
> > >
> > > > object.size(assays(wbmse))
> > >
> > > 2680430992 bytes
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > _______________________________________________
> > > Bioc-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list