[Bioc-devel] minimizing copies when creating ExpressionSet
bcarvalh at jhsph.edu
Tue Nov 10 22:58:13 CET 2009
thanks a lot for your help on this matter. I'll give your suggestions
With best wishes,
On Nov 7, 2009, at 11:24 PM, Martin Morgan wrote:
> Hi Benilton --
> I think through the 'front door' and in the current release / devel
> versions, the answer is no. The problem is that the row and column
> of assayData, phenoData, protocolData and featureData are all made
> to be
> the same, and this is done by identifying the appropriate names and
> doing the assignment, e.g., the equivalent of
> colnames(assayData[["exprs"]]) <- ... But this triggers a copy of
> assayData[["exprs"]], so doubles the memory requirement.
> But if the row / col names are made identical ahead of time, then one
> can make some headway by building up the appropriate data components,
> including coordinating the row and column names 'up front'
> assayData <- assayDataNew(exprs=matrix(0., 6.5e6, 70,
> phenoData <- annotatedDataFrameFrom(assayData[["exprs"]], FALSE)
> protocolData <- annotatedDataFrameFrom(assayData[["exprs"]], FALSE)
> featureData <- annotatedDataFrameFrom(assayData[["exprs"]], TRUE)
> and then creating and assembling the ExpressionSet one slot at a time,
> being careful to ensure that the resulting object is valid
> eset <- new("ExpressionSet")
> slot(eset, "assayData") <- assayData
> slot(eset, "phenoData") <- phenoData
> slot(eset, "featureData") <- featureData
> slot(eset, "protocolData") <- protocolData
>  TRUE
> Features Samples
> 6500000 70
> I sort of feel like this is a "rogue's game", where the user will
> quickly run into the situation where they want to do something that
> triggers a copy of the large data, and then they're in trouble again.
>> eset1 <- eset[,-1]
> Error: cannot allocate vector of size 3.3 Gb
> Benilton Carvalho wrote:
>> my bad... after creating either y1 or y1, resident memory used is
>> rouhgly 10GB (i'm counting here the 'x' object too, so i think
>> about 7GB
>> is used to create either object).
>> my question is if there's something i'm missing that would minimize
>> use of these 7gb....
>> sorry for the typo and possibly not making myself clear.
>> On Nov 7, 2009, at 6:11 PM, Benilton Carvalho wrote:
>>> given the following:
>>> x = matrix(pi, nr=6.5e6, nc=70) ##3.4GB
>>> y1 = new("ExpressionSet", exprs=x)
>>> y2 = new("ExpressionSet", assayData=assayDataNew("environment",
>>> Is there any obvious way of reducing the memory footprint when
>>> creating y1 and/or y2? With y1, it takes me around 18GB RAM... with
>>> y2, around 10GB. Is there anything else I can do from my end to
>>> minimize this?
>>> Thanks a lot,
>>> Bioc-devel at stat.math.ethz.ch mailing list
>> Bioc-devel at stat.math.ethz.ch mailing list
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793
More information about the Bioc-devel