[Rd] R/C++/memory leaks

Mon Feb 26 00:18:56 CET 2007

On 25 Feb 2007, at 22:21, Ross Boylan wrote:

> On Sun, Feb 25, 2007 at 05:37:24PM +0000, Ernest Turro wrote:
>> Dear all,
>>
>> I have wrapped a C++ function in an R package. I allocate/deallocate
>> memory using C++ 'new' and 'delete'. In order to allow user
>> interrupts without memory leaks I've moved all the delete statements
>> required after an interrupt to a separate C++ function freeMemory(),
>> which is called using on.exit() just before the .C() call.
>
> Do you mean that you call on.exit() before the .C, and the call to
> on.exit() sets up the handler?  Your last sentence sounds as if you
> invoke freeMemory() before the .C call.
>

" 'on.exit' records the expression given as its argument as needing
      to be executed when the current function exits (either naturally
      or as the result of an error)."

This means you call on.exit() somewhere at the top of the function.  
You are guaranteed the expression you pass to on.exit() will be  
executed before the function returns. So, even though you call on.exit 
() before .C(), the expression you pass it will actually be called  
after .C().

This means you can be sure that freeMemory() is called even if an  
interrupt or other error occurs.

> Another approach is to associate your C objects with an R object, and
> have them cleaned up when the R object gets garbage collected.
> However, this requires switching to a .Call interface from the more
> straightforward .C interface.
>
> The finalizer call I used doesn't assure cleanup on exit. The optional
> argument to R_RegisterCFinalizerEx might provide such assurance, but I
> couldn't tell what it really does.  Since all memory should
> be released by the OS, when the process ends, I wasn't so worried
> about that.
>
>
> Here's the pattern:
> // I needed R_NO_REMAP to avoid name collisions.  You may not.
> #define R_NO_REMAP 1
> #include <R.h>
> #include <Rinternals.h>
>
> extern "C" {
> // returns an |ExternalPtr|
> SEXP makeManager(
> 	@<makeManager args@>);
>
>
> // user should not need to call
> // cleanup
> void finalizeManager(SEXP ptr);
>
> }
>
> SEXP makeManager(
> 	@<makeManager args@>){
>     // .... stuff
>
>     Manager* pmanager = new Manager(pd, pm.release(),
>     	*INTEGER(stepNumerator), *INTEGER(stepDenominator),
>     	(*INTEGER(isexact)) != 0);
>
>     // one example didn't use |PROTECT()|
>     SEXP ptr;
>     Rf_protect(ptr = R_MakeExternalPtr(pmanager, R_NilValue,  
> R_NilValue));
>     R_RegisterCFinalizer(ptr, (R_CFinalizer_t) finalizeManager);
>     Rf_unprotect(1);
>     return ptr;
>
> }
>
> void finalizeManager(SEXP ptr){
>   Manager *pmanager = static_cast<Manager *>(R_ExternalPtrAddr(ptr));
>   delete pmanager;
>   R_ClearExternalPtr(ptr);
> }
>
> I'd love to hear from those more knowledgeable about whether I did
> that right, and whether the FinalizerEx call can assure cleanup on
> exit.
>
> Make manager needes to be called from R like this
>       mgr <- .Call("makeManager", args)
>

Since this is a standalone C++ program too, I'd rather use the R API  
as little as possible... But I will look at your solution if I find  
it is really necessary.. Thanks

>>
>> I am concerned about the following. In square brackets you see R's
>> total virtual memory use (VIRT in `top`):
>>
>> 1) Load library and data [178MB] (if I run gc(), then [122MB])
>> 2) Just before .C [223MB]
>> 3) Just before freeing memory [325MB]
> So you explicitly call your freeMemory() function?

This is called thanks to on.exit()

>> 4) Just after freeing memory [288MB]
> There are at least 3 possibilities:
>   * your C++ code is leaking

The number of news and deletes are the same, and so is their  
branching... I don't think it is this.

>   * C++ memory is never really returned (Commonly, at least in C, the
>   amount of memory allocated to the process never goes down, even if
>   you do a free.  This may depend on the OS and the specific calls the
>   program makes.

OK, but the memory should be freed after the process completes, surely?

>   * You did other stuff in R  that's still around.  After all you went
>   up +45MB between 1 and 2; maybe it's not so odd you went up +65MB
>   between 2 and 4.

Yep, I do stuff before .C and that accounts for the increase  
before .C. But all the objects created before .C go out of scope by  
4) and so, after gc(), we should be back to 122MB. As I mentioned, ls 
() after 5) returns only the data loaded in 1).

>> 5) After running gc() [230MB]
>>
>> So although the freeMemory function works (frees 37MB), R ends up
>> using 100MB more after the function call than before it. ls() only
>> returns the data object so no new objects have been added to the
>> workspace.
>>
>> Do any of you have any idea what could be eating this memory?
>>
>> Many thanks,
>>
>> Ernest
>>
>> PS: it is not practical to use R_alloc et al because C++ allocation/
>> deallocation involves constructors/destructors and because the C++
>> code is also compiled into a standalone binary (I would rather avoid
>> maintaining two separate versions).
>
> I use regular C++ new's too (except for the external pointer that's
> returned).  However, you can override the operator new in C++ so that
> it uses your own allocator, e.g., R_alloc.  I'm not sure about all the
> implications that might make that dangerous (e.g., can the memory be
> garbage collected?  can it be moved?).  Overriding new is a bit tricky
> since there are several variants.  In particular, there is one with
> and one without an exception.  Also, invdividual classes can define
> their own new operators; if you have any, you'd need to change those
> too.
>

That sounds rather dangerous!

Thanks very much for your thoughts, though.

> Ross Boylan
>

Ernest