[R] Memory usage and limit
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Apr 27 11:06:58 CEST 2006
R character vectors are stored as a list of character strings. On a 64-bit
system, each string has an overhead of about 64 bytes. R nowadays shares
strings if they are the same, but only for the first 'few': it gives up
after 10,000 distinct strings. Nevertheless, for many distinct short
strings this is very inefficient.
On Wed, 26 Apr 2006, Min Shao wrote:
> Hello everyone,
>
> I recently made a 64-bit build of R-2.2.1 under Solaris 9 using gcc v.3.4.2.
That's an inadvisable version of gcc, with a bug in g77 which affects some
R packages.
> The server has 12GB memory, 6 Sparc CPUs and plenty of swap space. I was the
> only user at the time of the following experiment.
>
> I wanted to benchmark R's capability to read large data files and used a
> data set consisting of 2MM records with 65 variables in each row. All but 2
> of the variables are of the character type and the other two are numeric.
> The whole data set is about 600 MB when stored as plain ASCII file.
>
> The following code was used in the benchmarking runs:
>
> c = list(var1=0, var2=0, var3="", var4="", .....var65="")
> A <- scan("test.dat", skip = 1, sep = ",", what = c, nmax=XXXXX,
> quiet=FALSE)
> summary(A)
> where XXXX = 1000000 or 2000000
>
> I made two runs with nmax=1000000 and nmax=2000000 respectively. The first
> run completed successfully, in about hour of CPU time. However, the actual
> memory usage exceeded 2.2GB, about 7 times of the acutal file size on disk.
> The second run aborted when the memory usage reached 4GB. The error messgae
> is "vector memory exhausted (limit reached?)".
>
> Three questions:
> 1) Why were so much memory and CPU consumed to read 300MB of data? Since
> almost all of the variables are character, I expected almost of 1-1 mapping
> between file size on disk and that in memory
> 2) Since this is a 64-bit build, I expected it could handle more than the
> 600MB of data I used. What does the error message mean? I don't beleive the
> vector length exceeded the theoretic limit of about 1 billion.
> 3) The original file was compressed and I had to uncompress it before the
> experiement. Is there a way to read compressed files directly in R
>
> Thanks so much for your help.
>
> Min
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list