[Rd] Re: [R] Memory Fragmentation in R

Luke Tierney luke at stat.uiowa.edu
Sat Feb 19 23:58:23 CET 2005


On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:

> Thanks Brian. I looked at the code (memory.c) after I sent out the first 
> email and noticed the malloc() call that you mention in your reply.
> Looking into this code suggested a possible scenario where R would fail in 
> malloc() even if it had enough free heap address space.
>
> I noticed that if there is enough heap address space (memory.c:1796, 
> VHEAP_FREE() > alloc_size) then the garbage collector is not run. So malloc 
> could fail (since there is no more address space to use), even though R 
> itself has enough free space it can reclaim. A simple fix is for R to try 
> doing garbage collection if malloc() fails.
>
> I hacked memory.c() to look in R_GenHeap[LARGE_NODE_CLASS].New if malloc() 
> fails (in a very similar fashion to ReleaseLargeFreeVectors())
> I did a "best-fit" stealing from this list and returned it to allocVector(). 
> This seemed to fix my particular problem - the large vectors that I had 
> allocated in the previous round were still sitting in  this list. Of course, 
> the right thing to do is to check if there are any free vectors of the right 
> size before calling malloc() - but it was simpler to do it my way (because I 
> did not have to worry about how efficient my best-fit was; memory allocation 
> was anyway going to fail).
>
> I can look deeper into this and provide more details if needed.

Thanks.  It looks like it would be a good idea to modify the malloc at
that point to try a GC if the malloc fails, then retry the malloc and
only bail if the second malloc fails.  I want to think this through a
bit more before going ahead, but I think it will be the right thing to
do.

Best,

luke


>
> Nawaaz
>
>
>
>
>
> Prof Brian Ripley wrote:
>> BTW, I think this is really an R-devel question, and if you want to pursue 
>> this please use that list.  (See the posting guide as to why I think so.)
>> 
>> This looks like fragmentation of the address space: many of us are using 
>> 64-bit OSes with 2-4Gb of RAM precisely to avoid such fragmentation.
>> 
>> Notice (memory.c line 1829 in the current sources) that large vectors are 
>> malloc-ed separately, so this is a malloc failure, and there is not a lot R 
>> can do about how malloc fragments the (presumably in your case as you did 
>> not say) 32-bit process address space.
>> 
>> The message
>>   1101.7 Mbytes of heap free (51%)
>> is a legacy of an earlier gc() and is not really `free': I believe it means 
>> something like `may be allocated before garbage collection is triggered': 
>> see memory.c.
>> 
>> 
>> On Sat, 19 Feb 2005, Nawaaz Ahmed wrote:
>> 
>>> I have a data set of roughly 700MB which during processing grows up to 2G 
>>> ( I'm using a 4G linux box). After the work is done I clean up (rm()) and 
>>> the state is returned to 700MB. Yet I find I cannot run the same routine 
>>> again as it claims to not be able to allocate memory even though gcinfo() 
>>> claims there is 1.1G left.
>>> 
>>>     At the start of the second time
>>>     ===============================
>>>               used  (Mb) gc trigger   (Mb)
>>>     Ncells  2261001  60.4    3493455   93.3
>>>     Vcells 98828592 754.1  279952797 2135.9
>>> 
>>>     Before Failing
>>>     ==============
>>>     Garbage collection 459 = 312+51+96 (level 0) ...
>>>     1222596 cons cells free (34%)
>>>     1101.7 Mbytes of heap free (51%)
>>>     Error: cannot allocate vector of size 559481 Kb
>>> 
>>> This looks like a fragmentation problem. Anyone have a handle on this 
>>> situation? (ie. any work around?) Anyone working on improving R's 
>>> fragmentation problems?
>>> 
>>> On the other hand, is it possible there is a memory leak? In order to make 
>>> my functions work on this dataset I tried to eliminate copies by coding 
>>> with references (basic new.env() tricks). I presume that my cleaning up 
>>> returned the temporary data (as evidenced by the gc output at the start of 
>>> the second round of processing). Is it possible that it was not really 
>>> cleaned up and is sitting around somewhere even though gc() thinks it has 
>>> been returned?
>>> 
>>> Thanks - any clues to follow up will be very helpful.
>>> Nawaaz
>> 
>> 
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Luke Tierney
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu



More information about the R-devel mailing list