[R] Memory Experimentation: Rule of Thumb = 10-15 Times the Memory
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Aug 9 23:18:32 CEST 2007
On Thu, 9 Aug 2007, Charles C. Berry wrote:
> On Thu, 9 Aug 2007, Michael Cassin wrote:
>> I really appreciate the advice and this database solution will be useful to
>> me for other problems, but in this case I need to address the specific
>> problem of scan and read.* using so much memory.
>> Is this expected behaviour?
Yes, and documented in the 'R Internals' manual. That is basic reading
for people wishing to comment on efficiency issues in R.
>> Can the memory usage be explained, and can it be
>> made more efficient? For what it's worth, I'd be glad to try to help if the
>> code for scan is considered to be worth reviewing.
> This does not seem to be an issue with scan() per se.
> Notice the difference in size of big2, big3, and bigThree here:
>> big2 <- rep(letters,length=1e6)
>  4.000856
>> big3 <- paste(big2,big2,sep='')
>  36.00002
On a 32-bit computer every R object has an overhead of 24 or 28 bytes.
Character strings are R objects, but in some functions such as rep (and
scan for up to 10,000 distinct strings) the objects can be shared. More
string objects will be shared in 2.6.0 (but factors are designed to be
efficient at storing character vectors with few values).
On a 64-bit computer the overhead is usually double. So I would expect
just over 56 bytes/string for distinct short strings (and that is what
But 56Mb is really not very much (tiny on a 64-bit computer), and 1
million items is a lot.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help