[BioC] made4, memory issues

Thu Nov 8 18:09:11 CET 2007

Dear Amin and BioC (sorry I forgot to cc BioC user)

We solved this issue.

Since you are investigating the matched genes, run the following code

data1723 <- read.csv('1723.csv',header=TRUE,row.names="geneNames")
data2224 <- read.csv('2224.csv',header=TRUE,row.names="geneNames")
coin <- cia(t(data1723),t(data2224))

Let me know if this works
Aedin

Moghaddasi Gholami Amin wrote:
> Dear Aedin, 
> 
> Many thanks for your quick response. 
> Since I am going to investigate gene association network based on reverse engineering approach as a part of my project, at the end I want do matching on genes. As I have mentioned I used two dataset (data1723, and data2224), each has 24 samples (columns) and 9335 genes (rows). I have used these datasets as a toy datasets.
> As I have understand correctly out of your suggestions, should I do first some clustering then coinertia, or..., please correct me if I'm wrong.
> Any help/suggestions are appreciated. 
> 
> Thanks again.
> Best Regards,
> Amin.
> 
> 
> 
> 
> -----Original Message-----
> From: aedin culhane [mailto:aedin at jimmy.harvard.edu]
> Sent: Wed 11/7/2007 10:23 PM
> To: Moghaddasi Gholami Amin
> Subject: Re: made4, memory issues
>  
> Hi Amin,
> What size are you datasets?  Are you matching on genes (probesets or 
> samples). Coinertia analysis is memory intensive if running on samples, 
> as there are lots of variables, However it should run easily if you are 
> matching on probesets.
> The code I have written expects matching on samples, which may explain 
> your problem.
> 
> If you are matching on probesets, transpose the data
> 
> coin <- cia(t(data1723),t(data2224))
> 
> If you are matching on the samples and have >3,000 probes/dataset, you 
> will need to find ways to maximize you memory usage or get extra memory.
> 
> Let me know and I will try to advise you
> Regards
> Aedin
> 
> 
> Moghaddasi Gholami Amin wrote:
>> Dear Dr. Culhane, 
>>
>> My name is Amin, a PhD student at German Cancer Research Center (DKFZ), Germany. 
>>
>> Since I wanted to use "made4" in order to perform coinertia analysis to investigate the covariance between two datasets on the same platform, (Non-time series, Affymetrix GeneChip Yeast Genome S98, number of genes=9335 and number of array samples=24, each). When I perform "cia" on the matrices, I've got the error message "cannot allocate vector of size 73 kb", which means that I have run out of ram. 
>>
>> Although I am using Dual Core Processor with 2GB of RAM running kubuntu Gutsy version 7.10, However I've increased the RAM up to 8GB to overcome this problem. Since I am going to analyze more datasets (probably some much bigger than the above), Would you please let me have your advises on the possibilities to cope with the memory usage? Or any other suggestions would be really appreciated.
>>
>> Please find the detail information as below including traceback(). 
>>
>> Best Regards,
>> Amin.
>>
>>> data1723 <- read.csv('1723.csv',header=TRUE,row.names="geneNames")
>>> data2224 <- read.csv('2224.csv',header=TRUE,row.names="geneNames")
>>> coin <- cia(data1723,data2224)
>> Error: cannot allocate vector of size 73 Kb
>>> dim(data1723)
>> [1] 9335   24
>>> dim(data2224)
>> [1] 9335   24
>>
>>> sessionInfo()
>> R version 2.6.0 (2007-10-03)
>> i486-pc-linux-gnu
>>
>> locale:
>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] made4_1.12.1 ade4_1.4-5
>>
>> loaded via a namespace (and not attached):
>> [1] rcompgen_0.1-17
>>> traceback()
>> 4: unlist(vlist, recursive = FALSE, use.names = FALSE)
>> 3: data.frame(tabcoiner)
>> 2: coinertia(t.dudi(coa1), t.dudi(coa2), nf = cia.nf, scan = cia.scan,
>>       ...)
>> 1: cia(data1723, data2224)
>> ---
>>
>> $ free -m
>>             total       used       free     shared    buffers
>> cached
>> Mem:          3424       3320        103          0          5
>> 115
>> -/+ buffers/cache:       3200        223
>> Swap:         6000         81       5919
>>
>>
>>
>> -------------------------------------
>> Amin Moghaddas Gholami
>> Functional Genome Analysis - B070
>> German Cancer Research Center - DKFZ
>> Im Neuenheimer Feld 580
>> D-69120 Heidelberg
>> Germany
>>
>> Phone: + 49 (0) 6221 42-2718
>> Fax:   + 49 (0) 6221 42-4687 
>> Email: a.moghaddasi at dkfz.de
>>
>> URL:   http://www.m-chips.org 
>>        http://www.dkfz.de/funct_genome
> 
> 
>