[Bioc-devel] strange behavior on memory usage (fwd)
Rafael A. Irizarry
ririzarr at jhsph.edu
Wed Aug 24 02:43:23 CEST 2005
On Mon, 22 Aug 2005, Vincent Carey 525-2265 wrote:
>> hi everyone,
>>
>> i was wondering if anybody could give me a hint of what causes a strange
>> behavior on memory usage when using oligo/makePlatformDesign packages.
>>
>> i'm reading a bunch of (affy) SNP chips:
>>
>>> x = read.celfiles(list.celfiles())
>> -> at this point the R process uses around 2GB
>> -> which does not look bad, since i'm reading 90 samples
>>> show(x)
>> -> now the R process uses around 6GB
>> -> how can i improve the code so it does not uses so much memory?
>> -> the information i'm using at this step comes basically from
>> -> dim(getExpData(x, "exprs"))
>
> I have not tried to reproduce this yet for lack of time. But it
> seems to me that the principle we need to establish here is:
> for any massive data structure, we need to put relevant metadata in slots,
> and interrogate only those slots. I don't know what dim() or getExpData()
> are doing, but my guess is that they are making some copies of something
> that they shouldn't need. you mention an issue with str() also -- now
> perhaps we need to write an oligobatch method for str that doesn't
> poke around too much? not sure
>
> Let's put the necessary dimension data in slots and be sure to update
> those slots whenever subsetting is done. And anything that show() needs
> should likewise be available without doing anything to the potentially
> massive datastructures.
>
> A couple of other points:
> 1) I noticed that a pdmapping environment has X and Y as vectors of integers.
> These are pretty big. Is it possible to use i2xy and xy2i software to get
> rid of these completely? these functions can be put into the environment,
> and the necessary offsets can be updated whenever a subset is done using
> a closure construct
the problem here is nimblegen. its hard to explain, but there is no
mapping for x,y to index.
> 2) installed package footprints with large .rda structures can be enormous, approaching
> 1GB. We can use save(...,compress=TRUE) to reduce the installed footprint
> and the usage overhead at load time seems quite acceptable. I got the
> pdmapping50khind240.rda down from 440MB to 60MB with this method. I understand
> that compress=TRUE has no impact on the compressed preinstallation package size.
> I am concerned about postinstall footprints.
>
how much longer does it take to load though? my thinking was that i rather
have a large object and faster loads.
>>> gc()
>> -> back to 2GB
>>
>> in the above, 'x' is an oligoBatch object (which contains eSet, details at the
>> end of this message).
>>
>> any suggestion?
>>
>> thanks a lot,
>>
>> benilton
>>
>> ps: i just noticed that using dim(exprs(x)) in show() reduces the memory usage
>> from 6GB to 3.5GB... and using str(x) increases it to 10.5GB.
>>
>> -----------------------------------------------------------------------------
>> R version 2.2.0, 2005-07-26, x86_64-unknown-linux-gnu
>>
>> attached base packages:
>> [1] "tools" "methods" "stats" "graphics" "grDevices" "utils"
>> [7] "datasets" "base"
>>
>> other attached packages:
>> oligo reposTools Biobase
>> "0.0.7" "1.6.0" "1.6.6"
>> -------------------------------------------------------------------------------
>>
>>> str(x)
>> Formal class 'oligoBatch' [package "oligo"] with 8 slots
>> ..@ manufacturer: chr "Affymetrix"
>> ..@ platform : chr "Mapping50K_Hind240"
>> ..@ eList :Formal class 'exprList' [package "Biobase"] with 2 slots
>> .. .. ..@ eMetadata:`data.frame': 0 obs. of 0 variables
>> .. .. ..@ eList :List of 1
>> .. .. .. ..$ exprs: num [1:2560000, 1:90] 1369 65472 ...
>> .. .. .. .. ..- attr(*, "dimnames")=List of 2
>> .. .. .. .. .. ..$ : NULL
>> .. .. .. .. .. ..$ : chr [1:90] "NA06985_Hind_B5_3005533.CEL" ...
>> ..@ description :Formal class 'MIAME' [package "Biobase"] with 11 slots
>> .. .. ..@ name : chr ""
>> .. .. ..@ lab : chr ""
>> .. .. ..@ contact : chr ""
>> .. .. ..@ title : chr ""
>> .. .. ..@ abstract : chr ""
>> .. .. ..@ url : chr ""
>> .. .. ..@ samples : list()
>> .. .. ..@ hybridizations: list()
>> .. .. ..@ normControls : list()
>> .. .. ..@ preprocessing :List of 2
>> .. .. .. ..$ filenames : chr [1:90] "NA06985_Hind_B5_3005533.CEL" ...
>> .. .. .. ..$ oligoversion: chr NA
>> .. .. ..@ other : list()
>> ..@ annotation : chr ""
>> ..@ sampleNames : chr [1:90] "NA06985_Hind_B5_3005533.CEL" ...
>> ..@ notes : chr ""
>> ..@ phenoData :Formal class 'phenoData' [package "Biobase"] with 3 slots
>> .. .. ..@ pData :`data.frame': 90 obs. of 1 variable:
>> .. .. .. ..$ sample: int [1:90] 1 2 3 4 5 6 7 8 9 10 ...
>> .. .. ..@ varLabels :List of 1
>> .. .. .. ..$ sample: chr "arbitrary numbering"
>> .. .. ..@ varMetadata:`data.frame': 0 obs. of 0 variables
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
> _______________________________________________
> Bioc-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
More information about the Bioc-devel
mailing list