[R-pkg-devel] [External] Re: Replacement for SETLENGTH

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Wed Jan 15 18:34:20 CET 2025


On Wed, 15 Jan 2025, Iris Simmons wrote:

> I don't think memcpy works well for VECSXP. The elements being overwritten
> need to have their reference counts decreased and the new elements need to
> have theirs increased.

You do not want to use memcpy or inanyother way try to write to the
locations in a VECSXP. It is not jus the reference counts but also the
integrity of the GC write barrier that you would be damaging.

>
> Also, I don't entirely know how accurate everything I'm about to say is,
> but I think you need to be using SET_TRUELENGTH and SET_GROWABLE_BIT along
> with SETLENGTH. There's an example here:
>
> https://github.com/wch/r-source/blob/744b5d34e1b8eb839e5d49d91ab21c1fe6800856/src/main/subassign.c#L257
>
>
> The example uses SET_STDVEC_LENGTH which shouldn't be used, just replace it
> with SETLENGTH.
>
> So in your code, I'd replace:
>
> SETLENGTH(modelspace, nUnique);
>
> with
>
> SET_GROWABLE_BIT(modelspace);
> SET_TRUELENGTH(modelspace, nModels);
> SETLENGTH(modelspace, nUnique);

These are not part of the API.

Support for growable vectors maybe added to the API in the future, but
probably with a more robust interface.

In any case, this mechanism is intended for growing, not shrinking,
vectors.

Initially over-allocating and returning a smaller result is a
reasonable strategy, but the right way to do it is to allocate a new
shorter result. xlengthgets is a convenient way to do this. Tholonger
vector will be subject to garbage collecion once there are no
remaining references to it.

Attempting to keep alive a longer allocation but pretending it is
shorter is mis-guided: it would keep alive a larger object than is
needed and so waste memory.

Best,

luke

> On Wed, Jan 15, 2025, 10:30 Merlise Clyde, Ph.D. <clyde using duke.edu> wrote:
>
>> Thanks for the added explanation Iris and Tomas!
>>
>> So looking at the code for xlengthgets, it does appear that I may take a
>> memory hit for multiple large objects due to the second allocation before
>> the old objects are possibly garbage collected.     There are about 12 such
>> instances per function that are returned (I do use a counter for keeping
>> track of the number of PROTECTED and to UNPROTECT for bookkeeping :-).
>>  For memory limited machines, the alloc/copy was a problem for memory usage
>> - and if I recall was one of the reasons in 2008 I switched to SETLENGTH,
>> which doesn't seem to do an allocation ???  If there is going to be an
>> absolute ban on SETLENGTH  in packages I'll probably need to address memory
>> management differently for those cases.
>>
>> I did see a note before the function def'n of xlengthgets:
>>
>> /* (if it is vectorizable). We could probably be fairly */
>> /* clever with memory here if we wanted to. */
>>
>> It would seem that memcpy would be more efficient for at least some of the
>> types  (REALSPX, INTSPX) unless I am missing something - but any way to be
>> more clever with VECSPX ?
>>
>> best,
>> Merlise
>>
>>
>>
>> Merlise Clyde (she/her/hers)
>> Professor of Statistical Science and Director of Graduate Studies
>> Duke University
>>
>> ________________________________________
>> From: Iris Simmons <ikwsimmo using gmail.com>
>> Sent: Wednesday, January 15, 2025 1:00 AM
>> To: Merlise Clyde, Ph.D. <clyde using duke.edu>
>> Cc: r-package-devel using r-project.org <r-package-devel using r-project.org>
>> Subject: Re: [R-pkg-devel] Replacement for SETLENGTH
>>
>> Hi Merlise!
>>
>>
>> Referring to here:
>>
>>
>> https://github.com/wch/r-source/blob/bb5a829466f77a3e1d03541747d149d65e900f2b/src/main/builtin.c#L834
>>
>> It seems as though the object is only re-used if the new length is
>> equal to the old length.
>>
>> If you use Rf_lengthgets, you will need to protect the return value.
>> The code you wrote that uses protect indexes looks correct, and the
>> reprotect is good because you no longer need the old object.
>>
>> 2 is the correct amount to unprotect. PROTECT and PROTECT_WITH_INDEX
>> (as far as I know) are the only functions that increase the size of
>> the protect stack, and so the only calls that need to be unprotected.
>> Typically, people define `int nprotect = 0;` at the start of their
>> functions, add `nprotect++;` after each PROTECT and PROTECT_WITH_INDEX
>> call, and add `UNPROTECT(nprotect);` immediately before each return or
>> function end. That makes it easier to keep track.
>>
>> I typically use R_PreserveObject and R_ReleaseObject to protect
>> objects without a need to bind them somewhere in my package's
>> namespace. This would be that .onLoad() uses R_PreserveObject to
>> protect some objects, and .onUnload uses R_ReleaseObject to release
>> the protected objects. I probably would not use that for what you're
>> describing.
>>
>>
>> Regards,
>>     Iris
>>
>> On Tue, Jan 14, 2025 at 11:26 PM Merlise Clyde, Ph.D. <clyde using duke.edu>
>> wrote:
>>>
>>> I am trying to determine the best way to eliminate the use of SETLENGTH
>> to truncate over allocated vectors in my package BAS to eliminate the NOTES
>> about non-API calls in anticipation of R 4.5.0.
>>>
>>> From WRE:  "At times it can be useful to allocate a larger initial
>> result vector and resize it to a shorter length if that is sufficient. The
>> functions Rf_lengthgets and Rf_xlengthgets accomplish this; they are
>> analogous to using length(x) <- n in R. Typically these functions return a
>> freshly allocated object, but in some cases they may re-use the supplied
>> object."
>>>
>>> it looks like using
>>>
>>>     x = Rf_lengthgets(x, newsize);
>>>     SET_VECTOR_ELT(result, 0, x);
>>>
>>> before returning works to resize without a performance hit that incurs
>> with a copy.  (will this always re-use the supplied object if newsize < old
>> size?)
>>>
>>> There is no mention in section 5.9.2 about the need for re-protection of
>> the object,  but it seems to be mentioned in some packages as well as a
>> really old thread about SET_LENGTH that looks like a  non-API MACRO to
>> lengthgets,
>>>
>>> indeed if I call gc() and then rerun my test I have had some
>> non-reproducible aborts in R Studio on my M3 Mac (caught once in R -d lldb)
>>>
>>> Do I need to do something more like
>>>
>>> PROTECT_INDEX ipx0;.
>>> PROTECT_WITH_INDEX(x0 = allocVector(REALSXP, old_size), &ipx0);
>>>
>>> PROTECT_INDEX ipx1;.
>>> PROTECT_WITH_INDEX(x1 = allocVector(REALSXP, old_size), &ipx1);
>>>
>>> # fill in values in x0 and  x1up to new_size (random) < old_size
>>> ...
>>> REPROTECT(x0 = Rf_lengthgets(x0, new_size), ipx0);
>>> REPROTECT(x1 = Rf_lengthgets(x1, new_size), ipx1);
>>>
>>> SET_VECTOR_ELT(result, 0, x0);
>>> SET_VECTOR_ELT(result, 1, x1);
>>> ...
>>> UNPROTECT(2);   # or is this 4?
>>> return(result);
>>>
>>>
>>> There is also a mention in WRE of R_PreserveObject and R_ReleaseObject -
>>>
>>> looking for advice if this is needed, or which approach is better/more
>> stable to replace SETLENGTH?   (I have many many instances that need to be
>> updated, so trying to get some clarity here before updating and running
>> code through valgrind or other sanitizers to catch any memory issues before
>> submitting an update to CRAN.
>>>
>>> best,
>>> Merlise
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ______________________________________________
>>> R-package-devel using r-project.org mailing list
>>>
>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!OToaGQ!ohDoxcAn5uIC25d42XhBz8Kd4YftOJDBoEW1NK9FOmgZpcmv0XIy5fQRm24-s_D8m9O_lR6jo6FcKiA$
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu/


More information about the R-package-devel mailing list