[Rd] Recycling memory with a small free list

Thu Feb 19 02:17:14 CET 2015

On Wed, Feb 18, 2015 at 7:19 AM, Radford Neal <radford at cs.toronto.edu> wrote:
>> ... with assignments inside of loops like this:
>>
>> reweight = function(iter, w, Q) {
>>   for (i in 1:iter) {
>>     wT = w * Q
>>   }
>> }
>> ... before the RHS is executed, the LHS allocation would be added
>> to a small fixed length list of available space which is checked
>> before future allocations.   If the same size is requested before the
>> next garbage collection, the allocation is short-circuited and the
>> allocation is reused.   This list could be very small, possibly even
>> only a single entry.  Entries would only be put on the list if they
>> have no other references.

Here's an article about the benefits of this approach in Go that might
explain better than I was able:
https://blog.cloudflare.com/recycling-memory-buffers-in-go/
Their charts explain the goal very clearly: stabilize at a smaller
amount of memory to reduce churn, which improves performance in a
myriad of ways.

> Reusing the LHS storage immediately isn't possible in general, because
> evaluation of the RHS might produce an error, in which case the LHS
> variable is supposed to be unchanged.

What's the guarantee R actually makes?  What's an example of the use
case where this behaviour would be required? More generally, can one
not assume "a = NULL; a = func()" is equivalent to "a = func()" unless
func() references 'a' or has it as an argument?  Or is the difficulty
that there is no way to know in advance it if will be referenced?

> Detecting special cases where
> there is guaranteed to be no error, or at least no error after the
> first modification to newly allocated memory, might be too
> complicated.

Yes, if required, the complexity of guaranteeing this might  well rule
out the approach I suggested.

> Putting the LHS storage on a small free list for later reuse (only
> after the old value of the variable will definitely be replaced) seems
> more promising (then one would need only two copies for examples such
> as above, with them being used in alternate iterations).

OK, let's consider that potentially easier option instead:  do nothing
immediately, but add a small queue for recycling from which the
temporary might be drawn.   It has slightly worse cache behavior, but
should handle most of the issues with memory churn.

> However,
> there's a danger of getting carried away and essentially rewriting
> malloc.  To avoid this, one might try just calling "free" on the
> no-longer-needed object, letting "malloc" then figure out when it can
> be re-used.

Yes, I think that's what I was anticipating:  add a free() equivalent
that does nothing if the object has multiple references/names, but
adds the object to small fixed size "free list" if it does not.
Perhaps this is only for certain types or for objects above a certain
size.

When requesting memory, allocvector() or perhaps R_alloc() does a
quick check of that "free list" to see if it has anything of the exact
requested size.  If it does, it short circuits and recycles it.  If it
doesn't, normal allocation takes place.

The "free list" is stored as two small fixed size arrays containing
size/address pairs.   Searching is done linearly using code that
optimizes to SIMD comparisons.   For 4/8/16 slots overhead of the
search should be unmeasurably fast.

The key to the approach would be keeping it simple, and realizing that
the goal is only to get the lowest hanging fruit:  repeated
assignments of large arrays used in a loop.  If it's complex, skip it
--- the behavior will be no worse than current.

By the way, what's happening with Luke's refcnt patches?  From the
outside, they seem like a great improvement.
http://homepage.stat.uiowa.edu/~luke/talks/dsc2014.pdf
http://developer.r-project.org/Refcnt.html
Are they slated to become the standard approach?  Are they going to be dropped?
 Will both approaches be kept in parallel?

> Unfortunately, that seems not to be safe, because it's
> possible that there is a reference to the no-longer-needed object on
> the PROTECT stack, even though no one should actually be looking at
> it any more.

Can you explain this case?   I don't think I understand it.

> In the current version of pqR (see pqR-project.org), modifications are
> (often) done in place for statements such as w = w * Q, but not
> curretly when the LHS variable does not appear on the RHS.

Yes, I looked at it earlier, and was excited to see that Luke had
ported half of your approach to standard R:
https://github.com/wch/r-source/blob/trunk/src/main/arithmetic.h#L65

But only the RHS temporary variables optimizations made it over. Your
LHS "w = w * Q" optimizations did not, but I didn't see any discussion
of why.   Was
it attempted and issues were found?

I think what I'm suggesting is complementary to that.   Direct reuse
is best if it can be detected, but recycling will provide more
opportunities for optimization.  Of course, what I'm suggesting is
always quite obvious, and I presume it's part what he includes in the
slide in his talk that mentions "Explore releasing memory when
reference count drops to zero".

--nate