[R] Memory usage and limit
roebuck at mdanderson.org
Thu Apr 27 07:32:44 CEST 2006
On Wed, 26 Apr 2006, Min Shao wrote:
> I recently made a 64-bit build of R-2.2.1 under Solaris 9 using gcc v.3.4.2.
> The server has 12GB memory, 6 Sparc CPUs and plenty of swap space. I was the
> only user at the time of the following experiment.
> I wanted to benchmark R's capability to read large data files and used a
> data set consisting of 2MM records with 65 variables in each row. All but 2
> of the variables are of the character type and the other two are numeric.
> The whole data set is about 600 MB when stored as plain ASCII file.
> The following code was used in the benchmarking runs:
> c = list(var1=0, var2=0, var3="", var4="", .....var65="")
> A <- scan("test.dat", skip = 1, sep = ",", what = c, nmax=XXXXX,
> where XXXX = 1000000 or 2000000
> I made two runs with nmax=1000000 and nmax=2000000 respectively. The first
> run completed successfully, in about hour of CPU time. However, the actual
> memory usage exceeded 2.2GB, about 7 times of the acutal file size on disk.
> The second run aborted when the memory usage reached 4GB. The error messgae
> is "vector memory exhausted (limit reached?)".
> Three questions:
> 1) Why were so much memory and CPU consumed to read 300MB of data? Since
> almost all of the variables are character, I expected almost of 1-1 mapping
> between file size on disk and that in memory
> 2) Since this is a 64-bit build, I expected it could handle more than the
> 600MB of data I used. What does the error message mean? I don't beleive the
> vector length exceeded the theoretic limit of about 1 billion.
> 3) The original file was compressed and I had to uncompress it before the
> experiement. Is there a way to read compressed files directly in R
A <- scan(gzfile("test.dat.gz", "r"),
skip = 1,
sep = ",",
what = c,
nmax = XXXXX,
SIGSIG -- signature too long (core dumped)
More information about the R-help