[Rd] memory management

Fri Aug 14 11:45:24 CEST 2009

On Thu, 13 Aug 2009 13:42:39 -0400
Simon Urbanek <simon.urbanek at r-project.org> wrote:

> I'm not convinced that what you propose is a good idea. First, I
> don't quite understand why you would want to use an existing SEXP -
> if you had a valid SEXP for the current R instance, then there is no
> need for R_RegisterObject. If the SEXP is from a different R instance

What I need is being able to map an arbitrary memory region containing
primitive data types to a SEXP. I basically need zero copy.

Maybe the function name is misleading. I don't really care about
sharing the SEXP itself between instances, I just want to share the
content of course.

What I'm doing now is pre-calculating the space needed for the SEXP
header, so that DATA_PTR is exactly at page boundaries. Every process
then gets his own copy of the SEXP structure (which is initialized in
each instance), but shares the content. This wastes a page for every
SEXP structure, but it's worth it considering the size of the array.

> What you possibly want (AFAIR) is a special allocVector version for  
> primitive types that defines the memory location in advance, so you  
> could specify a COW memmapped region used by the SEXP from the other  
> instance. Still, in either case you can only share primitive types  

An allocVector-like interface would be the same for me:

  SEXP* allocVector(type, n, void* ptr)

though I would still need to know the offset from where the data
starts.

Right know, I'm doing the work of allocVector in the extension, and
then calling R_RegisterObject. Using allocVector(type, n, ptr) is
certainly more sane.

> (INTSXP, REALSXP, LGLSXP, CPLXSXP) because anything more complex  
> (VECSXP, STRSXP, ...) requires you to re-map the payload as well and  
> you're back in the trouble of dependent SEXPs. But maybe I'm missing  

I'm aware of this, but that's enough.
I handled strings by normally calling allocVector and creating copies.

> PS: In addition, I think your implementation of R_UnregisterObject
> is too dangerous and superfluous - AFAICS it will break if there
> happens to exist a reference to the node (which you have no control
> of) since you unsnap it unconditionally. It also makes it impossible
> to use a finalizer, because you're forcefully preserving the object
> from collection (R actually allows double-release but you should not
> rely on it). Normally, you should not need R_UnregisterObject at
> all, because the GC should take care of it once you release it.

I see. I don't want the GC to take any action on the node however,
since trying to free() would fail, and re-using the content would
duplicate shared pages.

Would unsnapping the object from a finalizer be a good solution
instead or using R_[Un]ProtectObject?