[Rd] Changing arguments inside .Call. Wise to encourage "const" on all arguments?

Mon Dec 10 20:05:23 CET 2012

On Dec 10, 2012, at 1:51 AM, Paul Johnson wrote:

> I'm continuing my work on finding speedups in generalized inverse
> calculations in some simulations.  It leads me back to .C and .Call,
> and some questions I've never been able to answer for myself.  It may
> be I can push some calculations to LAPACK in or C BLAS, that's why I
> realized again I don't understand the call by reference or value
> semantics of .Call
> 
> Why aren't users of .Call encouraged to "const" their arguments, and
> why doesn't .Call do this for them (if we really believe in return by
> value)?
> 

Because there is a difference between the *data* part of the SEXP and the object itself. Internal structure of the object may need to be modified (e.g. the NAMED ref counting increased when you assign it) in a call to R API. You can't flag the data part as const separately, so you have to use non-const SEXP.

> R Gentleman's R Programming for Bioinformatics is the most
> understandable treatment I've found on .Call. It appears to me .Call
> leaves "wiggle room" where there should be none.  Here's Gentleman on
> p. 185. "For .Call and .External, the return value is an R object (the
> C functions must return a SEXP), and for these functions the values
> that were passed are typically not modified.  If they must be
> modified, then making a copy in R, prior to invoking the C code, is
> necessary."
> 
> I *think* that means:
> 
> .Call allows return by reference, BUT we really wish users would not
> use it. Users can damage R data structures that are pointed to unless
> they really truly know what they are doing on the C side. ??
> 
> This seems dangerous. Why allow return  by reference at all?
> 

Because it is completely legal to do things like

SEXP last(SEXP bar) {
   if (TYPEOF(bar) = VECSXP && LENGTH(bar) > 0)
     return VECTOR_ELT(bar, LENGTH(bar) - 1);
  Rf_error("sorry, I only work on lists");
 }

There is no point in duplicating the element.

> On p. 197, there's a similar comment  "Any function that has been
> invoked by either .External or .Call will have all of its arguments
> protected already. You do not need to protect them. .... [T]hey were
> not duplicated and should be treated as read-only values."
> 
> "should be ... read-only" concerns me. They are "protected" in the
> garbage collector sense,

Yes

> but they are not protected from "return by
> reference" damage. Right?
> 

There is no "return by reference damage".

The only problem is if you modify input arguments while someone else holds a reference, but there is no way in C to prevent that while still allowing them to be useful. Note that it is legal to modify input arguments if there are no references to it.

Cheers,
Simon

> Why doesn't the documentation recommend function writers to mark
> arguments to C functions as const?  Isn't that what the return by
> value policy would suggest?
> 
> Here's a troublesome example in  R src/main/array.c:
> 
> /* DropDims strips away redundant dimensioning information. */
> /* If there is an appropriate dimnames attribute the correct */
> /* element is extracted and attached to the vector as a names */
> /* attribute.  Note that this function mutates x. */
> /* Duplication should occur before this is called. */
> 
> SEXP DropDims(SEXP x)
> {
>    SEXP dims, dimnames, newnames = R_NilValue;
>    int i, n, ndims;
> 
>   PROTECT(x);
>   dims = getAttrib(x, R_DimSymbol);
> [... SNIP ....]
>    setAttrib(x, R_DimNamesSymbol, R_NilValue);
>    setAttrib(x, R_DimSymbol, R_NilValue);
>    setAttrib(x, R_NamesSymbol, newnames);
> [... SNIP ....]
> 
> return x;
> }
> 
> Well, at least there's a warning comment with that one.  But it does
> show .Call allows call by reference.
> 
> Why allow it, though? DropDims could copy x, modify the copy, and return it.
> 
> I wondered why DropDims bothers to return x at all. We seem to be
> using modify and return by reference there.
> 
> I also wondered why x is PROTECTED, actually. Its an argument, wasn't
> it automatically protected? Is it no longer  protected after the
> function starts modifying it?
> 
> Here's an  example with similar usage in Writing R Extensions, section
> 5.10.1 "Calling .Call".  It protects the arguments a and b (needed
> ??), then changes them.
> 
> #include <R.h>
> #include <Rdefines.h>
> 
>     SEXP convolve2(SEXP a, SEXP b)
>     {
>         R_len_t i, j, na, nb, nab;
>         double *xa, *xb, *xab;
>         SEXP ab;
> 
>         PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter
> "a"  in calling code*/
>         PROTECT(b = AS_NUMERIC(b));
>         na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1;
>         PROTECT(ab = NEW_NUMERIC(nab));
>         xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b);
>         xab = NUMERIC_POINTER(ab);
>         for(i = 0; i < nab; i++) xab[i] = 0.0;
>         for(i = 0; i < na; i++)
>              for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
>         UNPROTECT(3);
>         return(ab);
>     }
> 
> 
> Doesn't
> 
>        PROTECT(a = AS_NUMERIC(a));
> 
> have the alter the data structure "a" both inside the C function and
> in the calling R code as well? And, if a was PROTECTED automatically,
> could we do without that PROTECT()?
> 
> pj
> 
> -- 
> Paul E. Johnson
> Professor, Political Science      Assoc. Director
> 1541 Lilac Lane, Room 504      Center for Research Methods
> University of Kansas                 University of Kansas
> http://pj.freefaculty.org               http://quant.ku.edu
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>