[BioC] Maximum number of CEL files for ReadAffy() in Affy package.

Wed Jul 23 06:02:16 CEST 2008

Hi,

there is one more solution to handle large data sets: the affyPara 
Package (http://www.bioconductor.org/packages/bioc/html/affyPara.html)
You will need a computer cluster and you can do preprocessing in 
parallel mode.
If you have enough computers you can preprocess unlimited numbers of 
arrays and you will get a good speedup in computation time.

I think for 2000 arrays 5-6 computers with 4 GB should be enough 
(depending on the chip type).

Best
Markus

Hailong Cui schrieb:
> Dear all,
>
> First, I apologize for the mass email. I've been reading manuals, googling,
> searching the archive of the mailing list, but still cannot find an exact
> answer to my problem.
>
> (1) Question: Can a large number of CEL files cause an overflow for the
> function ReadAffy() in the affy packages? Is there any way to fix this?
> Other options seem to be other software RMAExpress and dChip in WindowsXP.
> Any suggestions?
>
> (2) Background: What I am trying to do is to read in all the CEL files in
> the directory to create an AffyBatch object, so that I can use functions in
> the affy package. To be more specific, I want to do RMA, dChip normalization
> and get MAplots. In my workstation (48 64-bit CPUs, 500Gb memory),
> ReadAffy() worked fine for 241 CEL files, but when I moved on to 2,035 CEL
> files, it failed and kept showing the error message below. The number of
> rows for the CEL file is roughly 50k. On the bright side, I tried justRMA()
> and got the expression values in the text format.
>
>   
>> R
>> library(affy)
>> Data <- ReadAffy()
>>     
> Error in read.affybatch(filenames = l$filenames, phenoData
> = l$phenoData,  :
>   allocMatrix: too many elements specified
>
>
> FYI, below is the session information on my workstation.
>
>   
>> sessionInfo()
>>     
> R version 2.7.1 (2008-06-23)
> ia64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] tools     stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>  [1] geneplotter_1.18.0          annotate_1.18.0
>  [3] xtable_1.5-2                AnnotationDbi_1.2.2
>  [5] RSQLite_0.6-9               DBI_0.2-4
>  [7] lattice_0.17-8              BufferedMatrixMethods_1.4.0
>  [9] BufferedMatrix_1.4.0        affy_1.18.2
> [11] preprocessCore_1.2.0        affyio_1.8.0
> [13] Biobase_2.0.1
>
> loaded via a namespace (and not attached):
> [1] grid_2.7.1         KernSmooth_2.22-22 RColorBrewer_1.0-2
>
>
>
>
> Thank you so much for reading this and I would appreciate your reply.
>
> Hailong
>
>
>