[Rd] memory management

Simon Urbanek simon.urbanek at r-project.org
Thu Aug 13 19:42:39 CEST 2009


Yuri,

I'm not convinced that what you propose is a good idea. First, I don't  
quite understand why you would want to use an existing SEXP - if you  
had a valid SEXP for the current R instance, then there is no need for  
R_RegisterObject. If the SEXP is from a different R instance then you  
can't use it, because it can be anything and may contain references to  
other SEXPs in the other instance which are invalid (including all of  
the internal ones like R_NilValue etc.) - hence I don't see what  
R_RegisterObject would buy you.

What you possibly want (AFAIR) is a special allocVector version for  
primitive types that defines the memory location in advance, so you  
could specify a COW memmapped region used by the SEXP from the other  
instance. Still, in either case you can only share primitive types  
(INTSXP, REALSXP, LGLSXP, CPLXSXP) because anything more complex  
(VECSXP, STRSXP, ...) requires you to re-map the payload as well and  
you're back in the trouble of dependent SEXPs. But maybe I'm missing  
something - describing what you really do with it may help since  
R_RegisterObject in itself doesn't make much sense to me ...

Cheers,
Simon

PS: In addition, I think your implementation of R_UnregisterObject is  
too dangerous and superfluous - AFAICS it will break if there happens  
to exist a reference to the node (which you have no control of) since  
you unsnap it unconditionally. It also makes it impossible to use a  
finalizer, because you're forcefully preserving the object from  
collection (R actually allows double-release but you should not rely  
on it). Normally, you should not need R_UnregisterObject at all,  
because the GC should take care of it once you release it.

On Aug 13, 2009, at 12:09 , Yuri D'Elia wrote:

> Hi everyone. In response to my previous message (Memory management
> issues), I've come up with the following patch against R 2.9.1.
>
> To summarize the situation:
>
> - We're hitting the memory barrier in our lab when running  
> concurrent R
>  processes due to the large datasets we use.
> - We don't want to copy data back-and-forth between our R extension
>  and R in order to reduce overall memory usage.
>
> There were some very useful suggestions in the list, but nothing
> optimal.
>
> With this patch, I export two new functions from memory.c called
> R_RegisterObject and R_UnregisterObject which simply allow to bypass
> allocVector. They accept a SEXP node (which needs to be allocated and
> initialized externally), protect it from collection by calling
> R_ProtectObject, and snap it temporarily into the GC oldest and  
> largest
> heap generation until the object is unregistered.
>
> Since these functions require knowledge of the inner workings of the
> SEXP object, they are exported only if USE_RINTERNALS is defined.
>
> By using these two functions, we developed a simple R extension which
> allows to load data.frames directly from COW memory pages by
> using mmap(), resulting in significant memory sharing between
> various processes using the same datasets (and instantaneous load
> times). This allowed us to program most of our code directly in R
> instead or resorting to C for performance or memory constraints.
>
> Could someone review the attached patch and spot any potential
> problems? Is a change like this likely to be integrated into the R
> sources? We would like to release our current R extension for anyone
> to use.
>
> Thanks.
> <r-extgc.diff>______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list