[BioC] Correlation works, but dist() runs out of memory

Wolfgang Huber huber at ebi.ac.uk
Tue Mar 13 18:12:14 CET 2007

Dear Daniel,

Please read the posting guide that recommends that you give a 
reproducible example and the output of sessionInfo. Also, there is no 
such thing as Bioconductor 0.9.

1) Are you sure you are giving it "only" a 22011 x 16 matrix? I get

 > a=numeric(2^31-1)
Error in vector("double", length) : cannot allocate vector of length 

 > a=numeric(2^31)
Error in vector("double", length) : vector size specified is too large

and of course 2^31 >> choose(22011,2).

2) choose(22011,2)*8/1e6 = 1937.84 i.e. one copy of your distance matrix 
would need 2 GB RAM, and if you have other large stuff around or if it 
needs to be copied, your 3 GB RAM may not be enough. Rather than brute 
force, thinking about reducing the set of genes to an interesting subset 
before doing the clustering might help.

 > sessionInfo()
R version 2.5.0 Under development (unstable) (2007-03-13 r40832)


attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
[7] "base"

Best wishes

Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

> I am attempting to do plot a hierarchical clustering dendogram of a
> reasonable modestly sized gene expression matrix of 22011 x 16.
> If I choose to use a correlation measure it works fine (
> c2 <- cor(ExonExpr)
> d2 <- as.dist(1-c2)
> hier2 <- hclust(d2,method="average")
> ).  If I try to create a Euclidean distance object it crashes out with a
> memory error (
>> Error in vector("double", length) : vector size specified is too large
> ).
> This seems strange as I have 3GB ram, which I would think is plenty. Any
> ideas what is going wrong or how to get round this.
> Thanks
> Dan
> PS Running R 2.4.1, Bioconductor 0.9 on SUSE 10.2 Linux.


More information about the Bioconductor mailing list