[R-pkg-devel] [External] Re: Replacement for SETLENGTH

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Thu Jan 16 21:51:54 CET 2025


You may be over-complicating this. Taking mcmc_new in src/lm_lcmc.c
from https://github.com/merliseclyde/BAS, to minimize code changes I
would arrange the memory management along these lines:

SEXP mcmc_new(...)
{
     /* ... */

     /* create and protect ANS */
     SEXP ANS = PROTECT(allocVector(VECSXP, 16));

     /* create the work vectors; placing them in ANS protects them */
     SEXP modelspace = allocVector(VECSXP, nModels);
     SET_VECTOR_ELT(ANS, 1, modelspace);
     SEXP logmarg = allocVector(REALSXP, nModels);
     SET_VECTOR_ELT(ANS, 2, logmarg);
     SEXP modelprobs = allocVector(REALSXP, nModels);
     SET_VECTOR_ELT(ANS, 3, modelprobs);
     /* etc */

     /* do your computations */

     if (nUnique < nModels) {
 	/* new values are protected via ANS;
 	   old ones are immediately available for GC */
 	SET_VECTOR_ELT(ANS, 1, xlengthgets(modelspace, nUnique));
 	SET_VECTOR_ELT(ANS, 2, xlengthgets(logmarg, nUnique));
 	SET_VECTOR_ELT(ANS, 3, xlengthgets(modelprobs, nUnique));
 	/* etc */
     }

     /* ... */
     UNPROTECT(1); /* ANS */
     return ANS;
}

Best,

luke

On Wed, 15 Jan 2025, Merlise Clyde, Ph.D. wrote:

>
> Thanks Luke !   I had seen the usage and discussion of growable vectors, as well as using SET_TRUELENGTH with SETLENGTH and didn't necessarily want to get even more out of API compliance :-). but if that looks like it will allowed (subject to perhaps changes) that seems like the better way forward to handle the different SEXPs.    And switching to smaller vectors and enlarging would be much more efficient in terms of memory.    I'll need to play around with how much to expand by as the enlargement would need to be in the loop with a final resizing before returning.
>
> So if I understand the suggestion the use of xlengthgets basically handles the body of the code in EnlargeVector function  for allocation and copying (but now smaller vectors) with then the extra step to SETLENGTH and SET_TRUELENGTH
> If done within a loop over MCMC iterations, then I would need to use SET_GROWABLE_BIT before the loop or when I encounter the need to enlarge.   (so basically a local implementation of EnlargeVector)
>
> For my non-VECSXP objects (REALSXP, INTSXP)  it might be more efficient to use Realloc on a working array within loops and only allocate and assign after determining the final length (nUnique), and freeing the memory myself...  That way I avoid SETLENGTH altogether for those types.
>
> best,
> Merlise
>
>
>
> Merlise Clyde (she/her/hers)
> Professor of Statistical Science and Director of Graduate Studies
> Duke University
>
>
> ________________________________________
> From: luke-tierney using uiowa.edu <luke-tierney using uiowa.edu>
> Sent: Wednesday, January 15, 2025 12:34 PM
> To: Iris Simmons <ikwsimmo using gmail.com>
> Cc: Merlise Clyde, Ph.D. <clyde using duke.edu>; List r-package-devel <r-package-devel using r-project.org>
> Subject: Re: [External] Re: [R-pkg-devel] Replacement for SETLENGTH
>  
> On Wed, 15 Jan 2025, Iris Simmons wrote:
>
>> I don't think memcpy works well for VECSXP. The elements being overwritten
>> need to have their reference counts decreased and the new elements need to
>> have theirs increased.
>
> You do not want to use memcpy or inanyother way try to write to the
> locations in a VECSXP. It is not jus the reference counts but also the
> integrity of the GC write barrier that you would be damaging.
>
>>
>> Also, I don't entirely know how accurate everything I'm about to say is,
>> but I think you need to be using SET_TRUELENGTH and SET_GROWABLE_BIT along
>> with SETLENGTH. There's an example here:
>>
>> https://urldefense.com/v3/__https://github.com/wch/r-source/blob/744b5d34e1b8eb839e5d49d91ab21c1fe6800856/src/main/subassign.c*L257__;Iw!!OToaGQ!uACtQIEun1eC8hwn-FzFogXQoPl1wETg9EUSV1NzAif9u15KlRTctzEq1RSA5rcbeVGv0n3geb8UexFngaonYos$
>>
>>
>> The example uses SET_STDVEC_LENGTH which shouldn't be used, just replace it
>> with SETLENGTH.
>>
>> So in your code, I'd replace:
>>
>> SETLENGTH(modelspace, nUnique);
>>
>> with
>>
>> SET_GROWABLE_BIT(modelspace);
>> SET_TRUELENGTH(modelspace, nModels);
>> SETLENGTH(modelspace, nUnique);
>
> These are not part of the API.
>
> Support for growable vectors maybe added to the API in the future, but
> probably with a more robust interface.
>
> In any case, this mechanism is intended for growing, not shrinking,
> vectors.
>
> Initially over-allocating and returning a smaller result is a
> reasonable strategy, but the right way to do it is to allocate a new
> shorter result. xlengthgets is a convenient way to do this. Tholonger
> vector will be subject to garbage collecion once there are no
> remaining references to it.
>
> Attempting to keep alive a longer allocation but pretending it is
> shorter is mis-guided: it would keep alive a larger object than is
> needed and so waste memory.
>
> Best,
>
> luke
>
>> On Wed, Jan 15, 2025, 10:30 Merlise Clyde, Ph.D. <clyde using duke.edu> wrote:
>>
>>> Thanks for the added explanation Iris and Tomas!
>>>
>>> So looking at the code for xlengthgets, it does appear that I may take a
>>> memory hit for multiple large objects due to the second allocation before
>>> the old objects are possibly garbage collected.     There are about 12 such
>>> instances per function that are returned (I do use a counter for keeping
>>> track of the number of PROTECTED and to UNPROTECT for bookkeeping :-).
>>>   For memory limited machines, the alloc/copy was a problem for memory usage
>>> - and if I recall was one of the reasons in 2008 I switched to SETLENGTH,
>>> which doesn't seem to do an allocation ???  If there is going to be an
>>> absolute ban on SETLENGTH  in packages I'll probably need to address memory
>>> management differently for those cases.
>>>
>>> I did see a note before the function def'n of xlengthgets:
>>>
>>> /* (if it is vectorizable). We could probably be fairly */
>>> /* clever with memory here if we wanted to. */
>>>
>>> It would seem that memcpy would be more efficient for at least some of the
>>> types  (REALSPX, INTSPX) unless I am missing something - but any way to be
>>> more clever with VECSPX ?
>>>
>>> best,
>>> Merlise
>>>
>>>
>>>
>>> Merlise Clyde (she/her/hers)
>>> Professor of Statistical Science and Director of Graduate Studies
>>> Duke University
>>>
>>> ________________________________________
>>> From: Iris Simmons <ikwsimmo using gmail.com>
>>> Sent: Wednesday, January 15, 2025 1:00 AM
>>> To: Merlise Clyde, Ph.D. <clyde using duke.edu>
>>> Cc: r-package-devel using r-project.org <r-package-devel using r-project.org>
>>> Subject: Re: [R-pkg-devel] Replacement for SETLENGTH
>>>
>>> Hi Merlise!
>>>
>>>
>>> Referring to here:
>>>
>>>
>>> https://urldefense.com/v3/__https://github.com/wch/r-source/blob/bb5a829466f77a3e1d03541747d149d65e900f2b/src/main/builtin.c*L834__;Iw!!OToaGQ!uACtQIEun1eC8hwn-FzFogXQoPl1wETg9EUSV1NzAif9u15KlRTctzEq1RSA5rcbeVGv0n3geb8UexFnr_0lCLM$
>>>
>>> It seems as though the object is only re-used if the new length is
>>> equal to the old length.
>>>
>>> If you use Rf_lengthgets, you will need to protect the return value.
>>> The code you wrote that uses protect indexes looks correct, and the
>>> reprotect is good because you no longer need the old object.
>>>
>>> 2 is the correct amount to unprotect. PROTECT and PROTECT_WITH_INDEX
>>> (as far as I know) are the only functions that increase the size of
>>> the protect stack, and so the only calls that need to be unprotected.
>>> Typically, people define `int nprotect = 0;` at the start of their
>>> functions, add `nprotect++;` after each PROTECT and PROTECT_WITH_INDEX
>>> call, and add `UNPROTECT(nprotect);` immediately before each return or
>>> function end. That makes it easier to keep track.
>>>
>>> I typically use R_PreserveObject and R_ReleaseObject to protect
>>> objects without a need to bind them somewhere in my package's
>>> namespace. This would be that .onLoad() uses R_PreserveObject to
>>> protect some objects, and .onUnload uses R_ReleaseObject to release
>>> the protected objects. I probably would not use that for what you're
>>> describing.
>>>
>>>
>>> Regards,
>>>      Iris
>>>
>>> On Tue, Jan 14, 2025 at 11:26 PM Merlise Clyde, Ph.D. <clyde using duke.edu>
>>> wrote:
>>>>
>>>> I am trying to determine the best way to eliminate the use of SETLENGTH
>>> to truncate over allocated vectors in my package BAS to eliminate the NOTES
>>> about non-API calls in anticipation of R 4.5.0.
>>>>
>>>> From WRE:  "At times it can be useful to allocate a larger initial
>>> result vector and resize it to a shorter length if that is sufficient. The
>>> functions Rf_lengthgets and Rf_xlengthgets accomplish this; they are
>>> analogous to using length(x) <- n in R. Typically these functions return a
>>> freshly allocated object, but in some cases they may re-use the supplied
>>> object."
>>>>
>>>> it looks like using
>>>>
>>>>      x = Rf_lengthgets(x, newsize);
>>>>      SET_VECTOR_ELT(result, 0, x);
>>>>
>>>> before returning works to resize without a performance hit that incurs
>>> with a copy.  (will this always re-use the supplied object if newsize < old
>>> size?)
>>>>
>>>> There is no mention in section 5.9.2 about the need for re-protection of
>>> the object,  but it seems to be mentioned in some packages as well as a
>>> really old thread about SET_LENGTH that looks like a  non-API MACRO to
>>> lengthgets,
>>>>
>>>> indeed if I call gc() and then rerun my test I have had some
>>> non-reproducible aborts in R Studio on my M3 Mac (caught once in R -d lldb)
>>>>
>>>> Do I need to do something more like
>>>>
>>>> PROTECT_INDEX ipx0;.
>>>> PROTECT_WITH_INDEX(x0 = allocVector(REALSXP, old_size), &ipx0);
>>>>
>>>> PROTECT_INDEX ipx1;.
>>>> PROTECT_WITH_INDEX(x1 = allocVector(REALSXP, old_size), &ipx1);
>>>>
>>>> # fill in values in x0 and  x1up to new_size (random) < old_size
>>>> ...
>>>> REPROTECT(x0 = Rf_lengthgets(x0, new_size), ipx0);
>>>> REPROTECT(x1 = Rf_lengthgets(x1, new_size), ipx1);
>>>>
>>>> SET_VECTOR_ELT(result, 0, x0);
>>>> SET_VECTOR_ELT(result, 1, x1);
>>>> ...
>>>> UNPROTECT(2);   # or is this 4?
>>>> return(result);
>>>>
>>>>
>>>> There is also a mention in WRE of R_PreserveObject and R_ReleaseObject -
>>>>
>>>> looking for advice if this is needed, or which approach is better/more
>>> stable to replace SETLENGTH?   (I have many many instances that need to be
>>> updated, so trying to get some clarity here before updating and running
>>> code through valgrind or other sanitizers to catch any memory issues before
>>> submitting an update to CRAN.
>>>>
>>>> best,
>>>> Merlise
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ______________________________________________
>>>> R-package-devel using r-project.org mailing list
>>>>
>>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!OToaGQ!ohDoxcAn5uIC25d42XhBz8Kd4YftOJDBoEW1NK9FOmgZpcmv0XIy5fQRm24-s_D8m9O_lR6jo6FcKiA$
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!OToaGQ!uACtQIEun1eC8hwn-FzFogXQoPl1wETg9EUSV1NzAif9u15KlRTctzEq1RSA5rcbeVGv0n3geb8UexFnYd-1ZB4$
>>
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  https://urldefense.com/v3/__http://www.stat.uiowa.edu/__;!!OToaGQ!uACtQIEun1eC8hwn-FzFogXQoPl1wETg9EUSV1NzAif9u15KlRTctzEq1RSA5rcbeVGv0n3geb8UexFnFDCXgK0$

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu/


More information about the R-package-devel mailing list