[Bioc-devel] strange behavior on memory usage

Wolfgang Huber huber at ebi.ac.uk
Tue Aug 23 23:47:41 CEST 2005


Hi Vince, et al.

it seems to me the problem is bigger than just fixing the "show" method
and caching (duplicating) e.g. the dimension information in extra slots.
I am a bit worried that if "getExpData" is such a memory hog the whole
eSet class becomes much less useful - and people might be tempted to
revert back to using simple matrices for performance-critical
computations. Is there a better way to do this avoiding such overhead
with "getExpData" in the first place? (I guess we might need somebody
who understands the memory management in R and perhaps even can write
some of the necessary infrastructure in C.)

What I don't understand in Benilton's Email (one of the many things) is
this "ps: i just noticed that using dim(exprs(x)) in show() reduces the
memory usage from 6GB to 3.5GB... " but the implementation of exprs() is

setMethod("exprs", "eSet",
           function(object) getExpData(object, "exprs")
           )

i.e. it just calls getExpData:

setMethod("getExpData", c("eSet", "character"),
           function(object, name) {
               object at eList[[name]] })

  Best,
  Wolfgang



Vincent Carey 525-2265 wrote:
>>hi everyone,
>>
>>i was wondering if anybody could give me a hint of what causes a strange
>>behavior on memory usage when using oligo/makePlatformDesign packages.
>>
>>i'm reading a bunch of (affy) SNP chips:
>>
>>
>>>x = read.celfiles(list.celfiles())
>>
>>     -> at this point the R process uses around 2GB
>>     -> which does not look bad, since i'm reading 90 samples
>>
>>>show(x)
>>
>>     -> now the R process uses around 6GB
>>     -> how can i improve the code so it does not uses so much memory?
>>     -> the information i'm using at this step comes basically from
>>     ->       dim(getExpData(x, "exprs"))
> 
> 
> I have not tried to reproduce this yet for lack of time.  But it
> seems to me that the principle we need to establish here is:
> for any massive data structure, we need to put relevant metadata in slots,
> and interrogate only those slots.  I don't know what dim() or getExpData()
> are doing, but my guess is that they are making some copies of something
> that they shouldn't need.  you mention an issue with str() also -- now
> perhaps we need to write an oligobatch method for str that doesn't
> poke around too much?  not sure
> 
> Let's put the necessary dimension data in slots and be sure to update
> those slots whenever subsetting is done.  And anything that show() needs
> should likewise be available without doing anything to the potentially
> massive datastructures.
> 
> A couple of other points:
> 1) I noticed that a pdmapping environment has X and Y as vectors of integers.
> These are pretty big.  Is it possible to use i2xy and xy2i software to get
> rid of these completely?  these functions can be put into the environment,
> and the necessary offsets can be updated whenever a subset is done using
> a closure construct
> 2) installed package footprints with large .rda structures can be enormous, approaching
> 1GB.  We can use save(...,compress=TRUE) to reduce the installed footprint
> and the usage overhead at load time seems quite acceptable.  I got the
> pdmapping50khind240.rda down from 440MB to 60MB with this method.  I understand
> that compress=TRUE has no impact on the compressed preinstallation package size.
> I am concerned about postinstall footprints.
> 
> 
>>>gc()
>>
>>     -> back to 2GB
>>
>>in the above, 'x' is an oligoBatch object (which contains eSet, details at the
>>end of this message).
>>
>>any suggestion?
>>
>>thanks a lot,
>>
>>benilton
>>
>>ps: i just noticed that using dim(exprs(x)) in show() reduces the memory usage
>>from 6GB to 3.5GB... and using str(x) increases it to 10.5GB.
>>
>>-----------------------------------------------------------------------------
>>R version 2.2.0, 2005-07-26, x86_64-unknown-linux-gnu
>>
>>attached base packages:
>>[1] "tools"     "methods"   "stats"     "graphics"  "grDevices" "utils"
>>[7] "datasets"  "base"
>>
>>other attached packages:
>>      oligo reposTools    Biobase
>>    "0.0.7"    "1.6.0"    "1.6.6"
>>-------------------------------------------------------------------------------
>>
>>
>>>str(x)
>>
>>Formal class 'oligoBatch' [package "oligo"] with 8 slots
>>   ..@ manufacturer: chr "Affymetrix"
>>   ..@ platform    : chr "Mapping50K_Hind240"
>>   ..@ eList       :Formal class 'exprList' [package "Biobase"] with 2 slots
>>   .. .. ..@ eMetadata:`data.frame':     0 obs. of  0 variables
>>   .. .. ..@ eList    :List of 1
>>   .. .. .. ..$ exprs: num [1:2560000, 1:90]  1369 65472  ...
>>   .. .. .. .. ..- attr(*, "dimnames")=List of 2
>>   .. .. .. .. .. ..$ : NULL
>>   .. .. .. .. .. ..$ : chr [1:90] "NA06985_Hind_B5_3005533.CEL" ...
>>   ..@ description :Formal class 'MIAME' [package "Biobase"] with 11 slots
>>   .. .. ..@ name          : chr ""
>>   .. .. ..@ lab           : chr ""
>>   .. .. ..@ contact       : chr ""
>>   .. .. ..@ title         : chr ""
>>   .. .. ..@ abstract      : chr ""
>>   .. .. ..@ url           : chr ""
>>   .. .. ..@ samples       : list()
>>   .. .. ..@ hybridizations: list()
>>   .. .. ..@ normControls  : list()
>>   .. .. ..@ preprocessing :List of 2
>>   .. .. .. ..$ filenames   : chr [1:90] "NA06985_Hind_B5_3005533.CEL" ...
>>   .. .. .. ..$ oligoversion: chr NA
>>   .. .. ..@ other         : list()
>>   ..@ annotation  : chr ""
>>   ..@ sampleNames : chr [1:90] "NA06985_Hind_B5_3005533.CEL" ...
>>   ..@ notes       : chr ""
>>   ..@ phenoData   :Formal class 'phenoData' [package "Biobase"] with 3 slots
>>   .. .. ..@ pData      :`data.frame':   90 obs. of  1 variable:
>>   .. .. .. ..$ sample: int [1:90] 1 2 3 4 5 6 7 8 9 10 ...
>>   .. .. ..@ varLabels  :List of 1
>>   .. .. .. ..$ sample: chr "arbitrary numbering"
>>   .. .. ..@ varMetadata:`data.frame':   0 obs. of  0 variables
>>

-- 
Best regards
   Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber



More information about the Bioc-devel mailing list