[R] Solved Re: Garbage collection problem

Peter Langfelder peter.langfelder at gmail.com
Fri Jan 4 04:44:33 CET 2013


Thanks for your reply, Duncan - you hit the nail on the head (as
usual, the problem turned out to sit between the keyboard and the
chair :)). My function does return regression models that contain the
input formulae together with the associated (big) environment.

Peter


On Thu, Jan 3, 2013 at 4:41 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
> On 13-01-03 7:01 PM, Peter Langfelder wrote:
>>
>> Hello all,
>>
>> I am running into a problem with garbage collection not being able to
>> free up all memory. Unfortunately I am unable to provide a minimal
>> self-contained example, although I can provide a self contained
>> example if anyone feels like wading through some 600 lines of code. I
>> would love to isolate the relevant parts from the code but whenever I
>> try to run a simpler example, the problem does not appear.
>>
>> I run an algorithm that repeats the same calculation (on sampled, i.e.
>> different data) in each iteration. Each iteration uses relatively
>> large intermediate objects and calculations but returns a smaller
>> result; these results are then collated and returned from the main
>> function (call it myFnc). The problem is that memory used by the
>> intermediate calculations (it is difficult to say whether it's objects
>> or memory needed for apply calls) does not seem to be freed up even
>> after doing explicit garbage collection using gc() within the loop.
>>
>> Thus, a call of something like
>>
>> result = myFnc(arguments)
>>
>> results is some memory that does not seem allocated to any visible
>> objects and yet is not freed up using gc(): After executing an actual
>> call to the offending function, gc() tells me that Vcells use 538.6
>> Mb, but the sum of object.size() of all objects listed by ls(all.names
>> = TRUE) is only 183.3 Mb.
>>
>>
>> The thing is that if I remove 'result' using rm(result) and do gc()
>> again, the memory used decreases by a lot.: gc() now reports 110.3 Mb
>> used in Vcells; this roughly corresponds to the sum of the sizes of
>> all objects returned by ls() (after removing 'result'), which is now
>> 108.7 Mb. So used memory went down by something like 428 Mb but the
>> object.size of 'result' is only about 75 Mb.
>>
>> Thus, it seems that the memory used by internal operations in myFun
>> that should be freed up upon the completion of the function call
>> cannot be released by garbage collection until the result of the
>> function call is also removed.
>>
>> Like I said, I tried to replicate this behaviour on simple examples
>> but could not.
>>
>> My question is, is this behaviour to be expected in complicated code,
>> or is it a bug that should be reported? Is there any way around it?
>>
>> Thanks in advance for any insights or pointers.
>
>
> I doubt if it is a bug.  Remember the warning from ?object.size:
>
> "Exactly which parts of the memory allocation should be attributed to which
> object is not clear-cut. This function merely provides a rough indication:
> it should be reasonably accurate for atomic vectors, but does not detect if
> elements of a list are shared, for example. (Sharing amongst elements of a
> character vector is taken into account, but not that between character
> vectors in a single object.)

If I understand correctly, sharing would inflate the sum of
object.size()'s relative to the values returned by gc(), correct? The
opposite is happening in my case.

>
> The calculation is of the size of the object, and excludes the space needed
> to store its name in the symbol table.
>
> Associated space (e.g. the environment of a function and what the pointer in
> a EXTPTRSXP points to) is not included in the calculation."
>
> For a simple example:
>
>> x <- 1:1000000
>> object.size(x)
> 4000024 bytes
>> e <- new.env()
>> object.size(e)
> 28 bytes
>> e$x <- x
>> object.size(e)
> 28 bytes
>
> At the end, e is an environment holding an object of 4 million bytes, but
> its size is 28 bytes.  You'll get environments whenever you return functions
> from other functions (e.g. what approxfun() does), or when you create
> formulas, e.g.
>
>> f <- function() { x <- 1:1000000
> +  y <- rnorm(1000000)
> +  y ~ x
> + }
>/
>> fla <- f()
>> object.size(fla)
> 372 bytes
>
> Now fla is the formula, but the data vectors x and y are part of its
> environment




More information about the R-help mailing list