[R] Memory Fragmentation in R
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat Feb 19 20:52:12 CET 2005
BTW, I think this is really an R-devel question, and if you want to pursue
this please use that list. (See the posting guide as to why I think so.)
This looks like fragmentation of the address space: many of us are using
64-bit OSes with 2-4Gb of RAM precisely to avoid such fragmentation.
Notice (memory.c line 1829 in the current sources) that large vectors are
malloc-ed separately, so this is a malloc failure, and there is not a lot
R can do about how malloc fragments the (presumably in your case as you
did not say) 32-bit process address space.
The message
1101.7 Mbytes of heap free (51%)
is a legacy of an earlier gc() and is not really `free': I believe it
means something like `may be allocated before garbage collection is
triggered': see memory.c.
On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:
> I have a data set of roughly 700MB which during processing grows up to 2G (
> I'm using a 4G linux box). After the work is done I clean up (rm()) and the
> state is returned to 700MB. Yet I find I cannot run the same routine again as
> it claims to not be able to allocate memory even though gcinfo() claims there
> is 1.1G left.
>
> At the start of the second time
> ===============================
> used (Mb) gc trigger (Mb)
> Ncells 2261001 60.4 3493455 93.3
> Vcells 98828592 754.1 279952797 2135.9
>
> Before Failing
> ==============
> Garbage collection 459 = 312+51+96 (level 0) ...
> 1222596 cons cells free (34%)
> 1101.7 Mbytes of heap free (51%)
> Error: cannot allocate vector of size 559481 Kb
>
> This looks like a fragmentation problem. Anyone have a handle on this
> situation? (ie. any work around?) Anyone working on improving R's
> fragmentation problems?
>
> On the other hand, is it possible there is a memory leak? In order to make my
> functions work on this dataset I tried to eliminate copies by coding with
> references (basic new.env() tricks). I presume that my cleaning up returned
> the temporary data (as evidenced by the gc output at the start of the second
> round of processing). Is it possible that it was not really cleaned up and is
> sitting around somewhere even though gc() thinks it has been returned?
>
> Thanks - any clues to follow up will be very helpful.
> Nawaaz
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list