[R] Memory usage and limit

Thu Apr 27 11:06:58 CEST 2006

R character vectors are stored as a list of character strings. On a 64-bit 
system, each string has an overhead of about 64 bytes.  R nowadays shares 
strings if they are the same, but only for the first 'few': it gives up 
after 10,000 distinct strings.  Nevertheless, for many distinct short 
strings this is very inefficient.

On Wed, 26 Apr 2006, Min Shao wrote:

> Hello everyone,
>
> I recently made a 64-bit build of R-2.2.1 under Solaris 9 using gcc v.3.4.2.

That's an inadvisable version of gcc, with a bug in g77 which affects some 
R packages.

> The server has 12GB memory, 6 Sparc CPUs and plenty of swap space. I was the
> only user at the time of the following experiment.
>
> I wanted to benchmark R's capability to read large data files and used a
> data set consisting of 2MM records with 65 variables in each row. All but 2
> of the variables are of the character type and the other two are numeric.
> The whole data set is about 600 MB when stored as plain ASCII file.
>
> The following code was used in the benchmarking runs:
>
>     c = list(var1=0, var2=0, var3="", var4="", .....var65="")
>     A <- scan("test.dat", skip = 1, sep = ",", what = c, nmax=XXXXX,
> quiet=FALSE)
>     summary(A)
> where XXXX = 1000000 or 2000000
>
> I made two runs with nmax=1000000 and nmax=2000000 respectively. The first
> run completed successfully, in about hour of CPU time. However, the actual
> memory usage exceeded 2.2GB, about 7 times of the acutal file size on disk.
> The second run aborted when the memory usage reached 4GB. The error messgae
> is  "vector memory exhausted (limit reached?)".
>
> Three questions:
> 1) Why were so much memory and CPU consumed to read 300MB of data? Since
> almost all of the variables are character, I expected almost of 1-1 mapping
> between file size on disk and that in memory
> 2) Since this is a 64-bit build, I expected it could handle more than the
> 600MB of data I used. What does the error message mean? I don't beleive the
> vector length exceeded the theoretic limit of about 1 billion.
> 3) The original file was compressed and I had to uncompress it before the
> experiement. Is there a way to read compressed files directly in R
>
> Thanks so much for your help.
>
> Min
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595