[Rd] Re: [R] Memory Fragmentation in R

Nawaaz Ahmed nawaaz at inktomi.com
Sun Feb 20 03:40:02 CET 2005


> 
> I am unclear what you actually did, but it may be a judicious gc() is 
> all that was needed: otherwise the issues should be the same in the 
> first and the subsequent run.  That's not to say that when the trigger 
> gets near the total address space we could not do better: and perhaps we 
> should not let it to do so (if we could actually determine the size of 
> the address space ... it is 2Gb or 3Gb on Windows for example).
> 

I did do gc() but only at the top level functions - there were internal 
functions in libraries/packages that were allocating space.

Here is how I think the problem happens. Consider code of the form
         x = as.vector(x)
	y = as.double(y)
where x is a 500MB matrix, y is 100 MB

Let's say we have 1201MB totally.
	Initially:
            x has 500MB, y has 100MB
            heap can grow by 601MB

	x = as.vector(x):
	   x has 500 MB, y has 100MB
            as.vector() duplicated 500MB (to be garbage collected)
            heap can grow by 101 MB

         y = as.vector(y)
            x has 500 MB, y has 100 MB
            R has 500 MB to be garbage collected
            as.vector() requires 100MB for duplicating y
            garbage collector is not run
                - required amount (100MB) < possible heap growth (101MB)
	   allocVector() calls malloc()
                - malloc() can fail at this point
                - it cannot get contiguous 100MB

You are right, it is most likely to happen close to the trigger. But the 
fix should be easy (call gc() if malloc() fails) - I initially hacked at
trying to steal vectors from the free list because I thought the problem 
I was seeing was due to address space fragmentation. The latter could 
still be a problem and would be harder to fix.

Thanks Luke and Brian!
Nawaaz



More information about the R-devel mailing list