[Rd] Re: [R] Memory Fragmentation in R
Nawaaz Ahmed
nawaaz at inktomi.com
Sun Feb 20 03:40:02 CET 2005
>
> I am unclear what you actually did, but it may be a judicious gc() is
> all that was needed: otherwise the issues should be the same in the
> first and the subsequent run. That's not to say that when the trigger
> gets near the total address space we could not do better: and perhaps we
> should not let it to do so (if we could actually determine the size of
> the address space ... it is 2Gb or 3Gb on Windows for example).
>
I did do gc() but only at the top level functions - there were internal
functions in libraries/packages that were allocating space.
Here is how I think the problem happens. Consider code of the form
x = as.vector(x)
y = as.double(y)
where x is a 500MB matrix, y is 100 MB
Let's say we have 1201MB totally.
Initially:
x has 500MB, y has 100MB
heap can grow by 601MB
x = as.vector(x):
x has 500 MB, y has 100MB
as.vector() duplicated 500MB (to be garbage collected)
heap can grow by 101 MB
y = as.vector(y)
x has 500 MB, y has 100 MB
R has 500 MB to be garbage collected
as.vector() requires 100MB for duplicating y
garbage collector is not run
- required amount (100MB) < possible heap growth (101MB)
allocVector() calls malloc()
- malloc() can fail at this point
- it cannot get contiguous 100MB
You are right, it is most likely to happen close to the trigger. But the
fix should be easy (call gc() if malloc() fails) - I initially hacked at
trying to steal vectors from the free list because I thought the problem
I was seeing was due to address space fragmentation. The latter could
still be a problem and would be harder to fix.
Thanks Luke and Brian!
Nawaaz
More information about the R-devel
mailing list