[Rd] Memory allocation in C/C++ vs R?

Simon Urbanek simon.urbanek at r-project.org
Fri Apr 30 21:10:27 CEST 2010


Dominick,

On Apr 30, 2010, at 2:51 PM, Dominick Samperi wrote:

> Thanks for the clarification Simon,
> 
> I think it is safe (R-safe?) to say that if there are no exceptions or errors
> on either side, then it is highly likely that everything is fine.
> 

I think so - at least on the exceptions topic. 


> When there are errors or exceptions, it is probably NOT safe to try to
> recover. Better to terminate the R session, insert some debug print statements
> (or breakpoints), and try to figure out what caused the problem.
> 

At the minimalistic level, yes. You can do better e.g. by tracking your objects, allocating only on the heap and so on, but the above will make sure there is no unexpected memory corruption or leakage (the hard way ;)).


> In other words, it works when it works.
> 
> This does not address the static initializer issue.
> 

Indeed. As I said, using heap objects (explicit initialization) does solve that issue, but you have to be again wary of other libraries which may still use static initializers.


> The good news is that R with C++/STL seems to work most of the
> time, on all of the architectures for which CRAN builds a binary.
> 

I would expect that as far as you can separate the setup part (creating output R objects etc.) from the C++ work (which won't callback into R and will catch all its exceptions) you should be safe. The danger is that in theory STL may create some object to keep beyond the scope of the call but hopefully it won't.

Cheers,
Simon



> 
> 
> On Fri, Apr 30, 2010 at 2:12 PM, Simon Urbanek
> <simon.urbanek at r-project.org> wrote:
>> Dominick,
>> 
>> On Apr 30, 2010, at 1:40 PM, Dominick Samperi wrote:
>> 
>>> Just to be sure that I understand, are you suggesting that the R-safe way to do things is to not use STL, and to not use C++ memory management and exception handling? How can you leave a function in an irregular way without triggering a seg fault or something like that, in which case there is no chance for recovery anyway?
>>> 
>>> In my experience the C++ exception stack seems to unwind properly before returning to R when there is an exception, and memory that is allocated by C++ functions seems to maintain its integrity and does not interfere with R's memory management.
>>> 
>>> It would be helpful if you could specify what kind of interference you are referring to here between C++ exception handling and R's error handling, and why STL is dangerous and best avoided in R. I have used STL with R for a long time and have experienced no problems.
>>> 
>> 
>> There are essentially two issues here that I had in mind.
>> 
>> 1) C++ exception handling and R exceptions handling both use setjmp/longjmp with the assumption that no one else does the same. That assumption is voided when both are used so interleaving them will cause problems (you're fine if you can guarantee that they always stack but that's not always easy to achieve yet easy to miss).
>> 
>> 2) C++ compilers assume that you cannot leave the context of a function in unusual ways. But you can, namely if an R error is raised. This affects (among others) locally allocated objects.
>> 
>> 
>> On 1:
>> 
>> You cannot interleave R error handling and C++ exceptions. For example if there is a chance of a C++ exception you must guarantee that the exception won't leave the R context that you are in. This is easily demonstrated because R check the consistency (see ex.1). Vice versa the consequences are not easily visible, because C++ provides no tracking, but is equally fatal. If you raise R exception from C++ it does not clean up whatever C++ exception context you were it and bypasses it. But there are even more grave consequences:
>> 
>> On 2:
>> 
>> If you any R error from within C++ code you'll break the assumption of C++ that it has control over the entry/exit point of a function. Take a really trivial example:
>> 
>> void foo() {
>> Object o;
>> // some other code ....
>> error("blah")
>> 
>> normally, the life of o is controlled by C++ and it will correctly execute its destructor when you leave the function. However, the error call in R will cause it to bypass that, the object won't be destroyed even though it was allocated on the stack. Although it's obvious in the example above, pretty much all R API function can raise errors so the same applies to any R API call - direct or indirect. As a consequence you pretty much cannot call R API function from C++ unless you are very, very careful (don't forget that C++ does a lot of things behind your back such as initializing objects, exception contexts etc. which you technically have no control over).
>> 
>> 
>> As I said in my post, you can write safe C++ code, but you have to be very careful. But the point about libraries is that you have no control over what they do, so you cannot know whether they will interact in a bad way with R or not. STL is an example where only the interface is defined, the implementations are not and vary by OS, compiler etc. This makes it pretty much impossible to use it reliably since the fact that it will work on one implementation doesn't mean that it will work on another since it is the implementation details that will bite you. (I know that we had reports of things breaking due to STL but I don't remember what implementation/OS it was)
>> 
>> [The above issue are only the ones I was pointing out, there may be others that are not covered here].
>> 
>> Cheers,
>> Simon
>> 
>> 
>> 
>> 
>> ---- R context vs C++ exception example
>> 
>> 
>>> dyn.load("stl.so")
>>> .Call("bar")
>> something went wrong somewhere in C++...
>> Warning: stack imbalance in '.Call', 2 then 4
>> NULL
>> 
>> -- what happens is that this really corrupts the R call stack since the C++ exception mechanism bypassed R's call stack so R is now is an inconsistent state. The same can be invoked vice-versa (and is more common - using error in C++ will do it) but that's harder to show because you would have to track C++ allocations to see that you're leaking objects all over the place. That is also the reason why it's hard to find unless it's too late (and things may *appear* to work for some time while they are not).
>> 
>> 
>> ----stl.cc:
>> 
>> #include <Rinternals.h>
>> #include <vector>
>> 
>> using namespace std;
>> 
>> extern "C" SEXP foo() {
>>  vector <int> a;
>>  a.resize(-1);
>>  return R_NilValue;
>> }
>> 
>> extern "C" SEXP bar() {
>>  try {
>>   // lots of other C++ code here ...
>>   eval(lang2(install(".Call"),mkString("foo")), R_GlobalEnv);
>>  } catch (...) {
>>   REprintf("something went wrong somewhere in C++...\n");
>>  }
>>  return R_NilValue;
>> }
>> 
>>> The fact that R has a C main may be problematic because C++ static
>>> initializers may not be called properly, but the fact that packages are
>>> usually loaded dynamically complicates this picture. The dynamic
>>> library itself may take care of calling the static initializers (I'm not
>>> sure about this, and this is probably OS-dependent). One possible
>>> work-around would be to compile the first few lines (a stub) of
>>> R main using the C++ compiler, leaving everything else as is
>>> and compiled using the C compiler (at least until CXXR is widely
>>> available).
>>> 
>>> Since C++ (and STL) are very popular it would be helpful for developers
>>> to have a better idea of the benefits and risks of using these tools
>>> with R.
>>> 
>>> Thanks,
>>> Dominick
>>> 
>>> On Fri, Apr 30, 2010 at 9:00 AM, Simon Urbanek
>>> <simon.urbanek at r-project.org> wrote:
>>>> Brian's answer was pretty exhaustive - just one more note that is indirectly related to memory management: C++ exception handling does interfere with R's error handling (and vice versa) so in general STL is very dangerous and best avoided in R. In addition, remember that regular local object rules are broken because you are not guaranteed to leave a function the regular way so there is a high danger of leaks and inconsistencies when using C++ memory management unless you specifically account for that. That said, I have written C++ code that works in R but you have to be very, very careful and think twice about using any complex C++ libraries since they are unlikely written in R-safe way.
>>>> 
>>>> Cheers,
>>>> Simon
>>>> 
>>>> 
>>>> On Apr 30, 2010, at 1:03 AM, Dominick Samperi wrote:
>>>> 
>>>>> The R docs say that there are two methods that the C programmer can
>>>>> allocate memory, one where R automatically frees the memory on
>>>>> return from .C/.Call, and the other where the user takes responsibility
>>>>> for freeing the storage. Both methods involve using R-provided
>>>>> functions.
>>>>> 
>>>>> What happens when the user uses the standard "new" allocator?
>>>>> What about when a C++ application uses STL and that library
>>>>> allocates memory? In both of these cases the R-provided functions
>>>>> are not used (to my knowledge), yet I have not seen any problems.
>>>>> 
>>>>> How is the memory that R manages and garbage collects kept
>>>>> separate from the memory that is allocated on the C++ side
>>>>> quite independently of what R is doing?
>>>>> 
>>>>> Thanks,
>>>>> Dominick
>>>>> 
>>>>>       [[alternative HTML version deleted]]
>>>>> 
>>>>> ______________________________________________
>>>>> R-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 



More information about the R-devel mailing list