[Rd] R/C++/memory leaks

Mon Feb 26 17:08:55 CET 2007

Thanks for your comments Ross. A couple more comments/queries below:

On 26 Feb 2007, at 06:43, Ross Boylan wrote:

> [details snipped]
>
> The use of the R api can be confined to a wrapper function.  But I can
> think of no reason that a change to the alternate approach I outlined
> would solve the apparent leaking you describe.
>

I'm not sure I see how a wrapper function using the R API would  
suffice. Example:

During heavy computation in the C++ function I need to allow  
interrupts from R. This means that R_CheckUserInterrupt needs to be  
called during the computation. Therefore, use of the R API can't be  
confined to just the wrapper function.

In fact, I'm worried that some of the libraries I'm using are failing  
to release memory after interrupt and that that is the problem. I  
can't see what I could do about that... E.g.

#include <valarray>

valarray<double> foo; // I don't know 100% that the foo object hasn't  
allocated some memory. if the program is interrupted it wouldn't be  
released....

I find it's very unfortunate that R_CheckUserInterrupt doesn't return  
a value. If it did (e.g. if it returned true if an interrupt has  
occurred), I could just branch off somewhere, clean up properly and  
return to R.

Any ideas on how this could be achieved?

Thanks,

E

>>
>>>>
>>>> I am concerned about the following. In square brackets you see R's
>>>> total virtual memory use (VIRT in `top`):
>>>>
>>>> 1) Load library and data [178MB] (if I run gc(), then [122MB])
>>>> 2) Just before .C [223MB]
>>>> 3) Just before freeing memory [325MB]
>>> So you explicitly call your freeMemory() function?
>>
>> This is called thanks to on.exit()
>>
>>>> 4) Just after freeing memory [288MB]
>>> There are at least 3 possibilities:
>>>  * your C++ code is leaking
>>
>> The number of news and deletes are the same, and so is their
>> branching... I don't think it is this.
>>
>>>  * C++ memory is never really returned (Commonly, at least in C, the
>>>  amount of memory allocated to the process never goes down, even if
>>>  you do a free.  This may depend on the OS and the specific calls  
>>> the
>>>  program makes.
>>
>> OK, but the memory should be freed after the process completes,
>> surely?
>
> Most OS's I know will free memory when a process finishes, except for
> shared memory.  But is that relevant?  I assume the process doesn't
> complete until you exit R.  Your puzzle seems to involve different
> stages within the life of a single process.
>
>>
>>>  * You did other stuff in R  that's still around.  After all you  
>>> went
>>>  up +45MB between 1 and 2; maybe it's not so odd you went up +65MB
>>>  between 2 and 4.
>>
>> Yep, I do stuff before .C and that accounts for the increase
>> before .C. But all the objects created before .C go out of scope by
>> 4) and so, after gc(), we should be back to 122MB. As I mentioned, ls
>> () after 5) returns only the data loaded in 1).
>
> In principle (and according to ?on.exit) the expression registered by
> on.exit is evaluated when the relevant function is exited.  In
> principle garbage collection reclaims all unused space (though with no
> guarantee of when).
>
> It may be that the practice is looser than the principle.  For  
> example,
> Python always nominally managed memory for you, but I think for
> quite awhile it didn't really reclaim the memory (because garbage
> collection didn't exist or had been turned off).
>
>
>>
>>>> 5) After running gc() [230MB]
>>>>
>>>> So although the freeMemory function works (frees 37MB), R ends up
>>>> using 100MB more after the function call than before it. ls() only
>>>> returns the data object so no new objects have been added to the
>>>> workspace.
>>>>
>>>> Do any of you have any idea what could be eating this memory?
>>>>
>>>> Many thanks,
>>>>
>>>> Ernest
>>>>
>>>> PS: it is not practical to use R_alloc et al because C++  
>>>> allocation/
>>>> deallocation involves constructors/destructors and because the C++
>>>> code is also compiled into a standalone binary (I would rather  
>>>> avoid
>>>> maintaining two separate versions).
>>>
>>> I use regular C++ new's too (except for the external pointer that's
>>> returned).  However, you can override the operator new in C++ so  
>>> that
>>> it uses your own allocator, e.g., R_alloc.  I'm not sure about  
>>> all the
>>> implications that might make that dangerous (e.g., can the memory be
>>> garbage collected?  can it be moved?).  Overriding new is a bit  
>>> tricky
>>> since there are several variants.  In particular, there is one with
>>> and one without an exception.  Also, invdividual classes can define
>>> their own new operators; if you have any, you'd need to change those
>>> too.
>>>
>>
>> That sounds rather dangerous!
> At least tedious to get right.  My statements weren't intended as a
> recommendation of this approach; I was just pointing out R_alloc and
> C++ allocation could probably be fit together.  If your C++ program
> isn't doing anything exotic with memory management there are probably
> 4 operators to redefine ( [singleton and array allocation] x
> [exception specification present or absent]).  Oops, you'd need to get
> the delete's as well...
>
>>
>> Thanks very much for your thoughts, though.
>>
> You could also try some memory leak detector on the problem to narrow
> it down.
>
>>> Ross Boylan
>>>
>>
>>
>> Ernest