[R] Memory Fragmentation in R

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Feb 19 20:52:12 CET 2005


BTW, I think this is really an R-devel question, and if you want to pursue 
this please use that list.  (See the posting guide as to why I think so.)

This looks like fragmentation of the address space: many of us are using 
64-bit OSes with 2-4Gb of RAM precisely to avoid such fragmentation.

Notice (memory.c line 1829 in the current sources) that large vectors are 
malloc-ed separately, so this is a malloc failure, and there is not a lot 
R can do about how malloc fragments the (presumably in your case as you 
did not say) 32-bit process address space.

The message
   1101.7 Mbytes of heap free (51%)
is a legacy of an earlier gc() and is not really `free': I believe it 
means something like `may be allocated before garbage collection is 
triggered': see memory.c.


On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:

> I have a data set of roughly 700MB which during processing grows up to 2G ( 
> I'm using a 4G linux box). After the work is done I clean up (rm()) and the 
> state is returned to 700MB. Yet I find I cannot run the same routine again as 
> it claims to not be able to allocate memory even though gcinfo() claims there 
> is 1.1G left.
>
> 	At the start of the second time
> 	===============================
>          	 used  (Mb) gc trigger   (Mb)
> 	Ncells  2261001  60.4    3493455   93.3
> 	Vcells 98828592 754.1  279952797 2135.9
>
> 	Before Failing
> 	==============
> 	Garbage collection 459 = 312+51+96 (level 0) ...
> 	1222596 cons cells free (34%)
> 	1101.7 Mbytes of heap free (51%)
> 	Error: cannot allocate vector of size 559481 Kb
>
> This looks like a fragmentation problem. Anyone have a handle on this 
> situation? (ie. any work around?) Anyone working on improving R's 
> fragmentation problems?
>
> On the other hand, is it possible there is a memory leak? In order to make my 
> functions work on this dataset I tried to eliminate copies by coding with 
> references (basic new.env() tricks). I presume that my cleaning up returned 
> the temporary data (as evidenced by the gc output at the start of the second 
> round of processing). Is it possible that it was not really cleaned up and is 
> sitting around somewhere even though gc() thinks it has been returned?
>
> Thanks - any clues to follow up will be very helpful.
> Nawaaz

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list