[R] cor.test() running out of memory on 64-bit system

Alex Reynolds reynolda at uw.edu
Fri Jun 3 13:37:45 CEST 2011


I am running into resource issues with calculating correlation scores with cor.test(), on R 2.13.0:

  R version 2.13.0 (2011-04-13) ...
  Platform: x86_64-unknown-linux-gnu (64-bit)

In my test case, I read in a pair of ~150M vectors from text files using the pipe() and scan() functions, which pull in a specific column of numeric values from a text file. Once I have the two vectors, I run cor.test() on them.

If I run this on our compute cluster (running SGE), I have the option of setting hard limits on the memory assigned to the compute slot or node that my R task is sent to (this is done to keep R from grabbing so much memory from the compute cluster that other non-R tasks stall and fail). 

If I set hard limits (h_data and h_vmem) under 8 GB, then the R task finishes early with the following R error:

  Error: cannot allocate vector of size 2.0 Gb

What is confusing to me is that I have a 64-bit version of R, and so I should be able to use hard limits of 4GB (or, say, 5GB, if I make a generous assumption of 1GB of overhead) for this particular input size (2 GB x 2 vectors -- plus, say, 1GB of overhead).

What seems to be the case is that the overhead is closer to 4 GB in size, itself, in addition to the 4 GB for the two input vectors, based on hard limits. If my hard limits are under 8 GB, then the job fails. 

Does cor.test() really require this much extra space, or have I missed some compilation or other magic setting that addresses this aspect of running cor.test()?

Thanks for your advice.

Regards,
Alex


More information about the R-help mailing list