[R-pkg-devel] Replacement for SETLENGTH

Wed Jan 15 09:58:11 CET 2025

On 1/15/25 05:26, Merlise Clyde, Ph.D. wrote:
> I am trying to determine the best way to eliminate the use of SETLENGTH to truncate over allocated vectors in my package BAS to eliminate the NOTES about non-API calls in anticipation of R 4.5.0.
>
>  From WRE:  "At times it can be useful to allocate a larger initial result vector and resize it to a shorter length if that is sufficient. The functions Rf_lengthgets and Rf_xlengthgets accomplish this; they are analogous to using length(x) <- n in R. Typically these functions return a freshly allocated object, but in some cases they may re-use the supplied object."
>
> it looks like using
>
>      x = Rf_lengthgets(x, newsize);
>      SET_VECTOR_ELT(result, 0, x);
>      
> before returning works to resize without a performance hit that incurs with a copy.  (will this always re-use the supplied object if newsize < old size?)
>
> There is no mention in section 5.9.2 about the need for re-protection of the object,  but it seems to be mentioned in some packages as well as a really old thread about SET_LENGTH that looks like a  non-API MACRO to lengthgets,
>
> indeed if I call gc() and then rerun my test I have had some non-reproducible aborts in R Studio on my M3 Mac (caught once in R -d lldb)

The important part for protection is that Rf_lengthgets _may_ return a 
freshly allocated object. This means that the object needs protection 
from garbage collection, implicit or explicit - and that is covered in 
section "Handling the effects of garbage collection".  There are  many 
functions in the  R API that return freshly allocated objects, so don't 
expect that documentation of every such function would give advice on 
how to protect, that is covered in that special section.

So, you are right, some protection is needed _if_ the return value of 
Rf_lengthgets may be exposed to gc().

>
> Do I need to do something more like
>
> PROTECT_INDEX ipx0;.
> PROTECT_WITH_INDEX(x0 = allocVector(REALSXP, old_size), &ipx0);
>
> PROTECT_INDEX ipx1;.
> PROTECT_WITH_INDEX(x1 = allocVector(REALSXP, old_size), &ipx1);
>
> # fill in values in x0 and  x1up to new_size (random) < old_size
> ...
> REPROTECT(x0 = Rf_lengthgets(x0, new_size), ipx0);
> REPROTECT(x1 = Rf_lengthgets(x1, new_size), ipx1);
>
> SET_VECTOR_ELT(result, 0, x0);
> SET_VECTOR_ELT(result, 1, x1);
> ...
> UNPROTECT(2);   # or is this 4?

You have protected two objects here, one was in x0 and one in x1 
(REPROTECT doesn't change the depth of the protection stack). Some 
people put that into a comment:

UNPROTECT(2); /* x1, x0 */

The code above is ok. In some cases, you can shuffle it around a bit or 
rely on implicit protection if you want to reduce the need for explicit 
protection. But perfomance-wise it doesn't matter given code that is 
allocating, etc, that takes much more time - it is more about readability.

For instance,

result = PROTECT(allocVector(...))
x0 = allocVector()
SET_VECTOR_ELT(result, 0, x0);
// now x0 is implicitly protected via result
...

x0 = Rf_lengthgets(..)
SET_VECTOR_ELT(result, 0, x0);
/// now the new value of x0 is implicitly protected via result (the old 
value may not be)

UNPROTECT(1)  // result
return result

> return(result);
>
>
> There is also a mention in WRE of R_PreserveObject and R_ReleaseObject -
>
> looking for advice if this is needed, or which approach is better/more stable to replace SETLENGTH?   (I have many many instances that need to be updated, so trying to get some clarity here before updating and running code through valgrind or other sanitizers to catch any memory issues before submitting an update to CRAN.

PreserveObject/ReleaseObject is good e.g. for global structures, 
probably not in this case. The difficulty there is making sure 
ReleaseObject() does execute in case of error, a non-local return. On 
the other hand, protection via PROTECT/UNPROTECT is automatically robust 
to non-local returns (automatic unprotection).

There is nothing specific about Rf_lengthgets wrt to protection here - 
the same rules apply to any other R API function that returns an SEXP.

For finding protection bugs in code, one can use an R build with barrier 
checking enabled and gctorture or rchk tool. Some bugs may lead to 
crashes or incorrect outputs even in normaln builds. Some bugs may be 
found by UBSAN. But none of this is a verification tool, one can only 
find some bugs in some cases, correctness remains the responsibility of 
the programmer.

Best
Tomas

>
> best,
> Merlise
>
>
>
>
>
>
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel