[R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Jun 26 18:53:28 CEST 2007
The R Data Import/Export Manual points out several ways in which you can
use read.csv more efficiently.
On Tue, 26 Jun 2007, ivo welch wrote:
> dear R experts:
>
> I am of course no R experts, but use it regularly. I thought I would
> share some experimentation with memory use. I run a linux machine
> with about 4GB of memory, and R 2.5.0.
>
> upon startup, gc() reports
>
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 268755 14.4 407500 21.8 350000 18.7
> Vcells 139137 1.1 786432 6.0 444750 3.4
>
> This is my baseline. linux 'top' reports 48MB as baseline. This
> includes some of my own routines that are always loaded. Good..
>
>
> Next, I created a s.csv file with 22 variables and 500,000
> observations, taking up an uncompressed disk space of 115MB. The
> resulting object.size() after a read.csv() is 84,002,712 bytes (80MB).
>
>> s= read.csv("s.csv");
>> object.size(s);
>
> [1] 84002712
>
>
> here is where things get more interesting. after the read.csv() is
> finished, gc() reports
>
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 270505 14.5 8349948 446.0 11268682 601.9
> Vcells 10639515 81.2 34345544 262.1 42834692 326.9
>
> I was a big surprised by this---R had 928MB intermittent memory in
> use. More interestingly, this is also similar to what linux 'top'
> reports as memory use of the R process (919MB, probably 1024 vs. 1000
> B/MB), even after the read.csv() is finished and gc() has been run.
> Nothing seems to have been released back to the OS.
>
> Now,
>
>> rm(s)
>> gc()
> used (Mb) gc trigger (Mb) max used (Mb)
> Ncells 270541 14.5 6679958 356.8 11268755 601.9
> Vcells 139481 1.1 27476536 209.7 42807620 326.6
>
> linux 'top' now reports 650MB of memory use (though R itself uses only
> 15.6Mb). My guess is that It leaves the trigger memory of 567MB plus
> the base 48MB.
>
>
> There are two interesting observations for me here: first, to read a
> .csv file, I need to have at least 10-15 times as much memory as the
> file that I want to read---a lot more than the factor of 3-4 that I
> had expected. The moral is that IF R can read a .csv file, one need
> not worry too much about running into memory constraints lateron. {R
> Developers---reducing read.csv's memory requirement a little would be
> nice. of course, you have more than enough on your plate, already.}
>
> Second, memory is not returned fully to the OS. This is not
> necessarily a bad thing, but good to know.
>
> Hope this helps...
>
> Sincerely,
>
> /iaw
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list