[Rd] Changing arguments inside .Call. Wise to encourage "const" on all arguments?

Simon Urbanek simon.urbanek at r-project.org
Mon Dec 10 22:48:40 CET 2012


On Dec 10, 2012, at 2:05 PM, Simon Urbanek wrote:

> 
> On Dec 10, 2012, at 1:51 AM, Paul Johnson wrote:
> 
>> I'm continuing my work on finding speedups in generalized inverse
>> calculations in some simulations.  It leads me back to .C and .Call,
>> and some questions I've never been able to answer for myself.  It may
>> be I can push some calculations to LAPACK in or C BLAS, that's why I
>> realized again I don't understand the call by reference or value
>> semantics of .Call
>> 
>> Why aren't users of .Call encouraged to "const" their arguments, and
>> why doesn't .Call do this for them (if we really believe in return by
>> value)?
>> 
> 
> Because there is a difference between the *data* part of the SEXP and the object itself. Internal structure of the object may need to be modified (e.g. the NAMED ref counting increased when you assign it) in a call to R API. You can't flag the data part as const separately, so you have to use non-const SEXP.
> 
> 
>> R Gentleman's R Programming for Bioinformatics is the most
>> understandable treatment I've found on .Call. It appears to me .Call
>> leaves "wiggle room" where there should be none.  Here's Gentleman on
>> p. 185. "For .Call and .External, the return value is an R object (the
>> C functions must return a SEXP), and for these functions the values
>> that were passed are typically not modified.  If they must be
>> modified, then making a copy in R, prior to invoking the C code, is
>> necessary."
>> 
>> I *think* that means:
>> 
>> .Call allows return by reference, BUT we really wish users would not
>> use it. Users can damage R data structures that are pointed to unless
>> they really truly know what they are doing on the C side. ??
>> 
>> This seems dangerous. Why allow return  by reference at all?
>> 
> 
> Because it is completely legal to do things like
> 
> SEXP last(SEXP bar) {
>   if (TYPEOF(bar) = VECSXP && LENGTH(bar) > 0)
>     return VECTOR_ELT(bar, LENGTH(bar) - 1);
>  Rf_error("sorry, I only work on lists");
> }
> 

Martin Morgan pointed out that this example is a bad one -- which is true. The common idiom that is safe is

SEXP foo(SEXP bar) {
...
return bar;
}

However, the last() example above is bad, because returning the element directly is a bad idea -- the conservative approach would be to use duplicate(), the more efficient one would be to bump up NAMED. Sorry, my bad. I guess I was rather strengthening Paul's point to duplicate() when in doubt even if it's less efficient :).

Cheers,
Simon


> There is no point in duplicating the element.
> 
> 
> 
>> On p. 197, there's a similar comment  "Any function that has been
>> invoked by either .External or .Call will have all of its arguments
>> protected already. You do not need to protect them. .... [T]hey were
>> not duplicated and should be treated as read-only values."
>> 
>> "should be ... read-only" concerns me. They are "protected" in the
>> garbage collector sense,
> 
> Yes
> 
> 
>> but they are not protected from "return by
>> reference" damage. Right?
>> 
> 
> There is no "return by reference damage".
> 
> The only problem is if you modify input arguments while someone else holds a reference, but there is no way in C to prevent that while still allowing them to be useful. Note that it is legal to modify input arguments if there are no references to it.
> 
> Cheers,
> Simon
> 
> 
>> Why doesn't the documentation recommend function writers to mark
>> arguments to C functions as const?  Isn't that what the return by
>> value policy would suggest?
>> 
>> Here's a troublesome example in  R src/main/array.c:
>> 
>> /* DropDims strips away redundant dimensioning information. */
>> /* If there is an appropriate dimnames attribute the correct */
>> /* element is extracted and attached to the vector as a names */
>> /* attribute.  Note that this function mutates x. */
>> /* Duplication should occur before this is called. */
>> 
>> SEXP DropDims(SEXP x)
>> {
>>   SEXP dims, dimnames, newnames = R_NilValue;
>>   int i, n, ndims;
>> 
>>  PROTECT(x);
>>  dims = getAttrib(x, R_DimSymbol);
>> [... SNIP ....]
>>   setAttrib(x, R_DimNamesSymbol, R_NilValue);
>>   setAttrib(x, R_DimSymbol, R_NilValue);
>>   setAttrib(x, R_NamesSymbol, newnames);
>> [... SNIP ....]
>> 
>> return x;
>> }
>> 
>> Well, at least there's a warning comment with that one.  But it does
>> show .Call allows call by reference.
>> 
>> Why allow it, though? DropDims could copy x, modify the copy, and return it.
>> 
>> I wondered why DropDims bothers to return x at all. We seem to be
>> using modify and return by reference there.
>> 
>> I also wondered why x is PROTECTED, actually. Its an argument, wasn't
>> it automatically protected? Is it no longer  protected after the
>> function starts modifying it?
>> 
>> Here's an  example with similar usage in Writing R Extensions, section
>> 5.10.1 "Calling .Call".  It protects the arguments a and b (needed
>> ??), then changes them.
>> 
>> #include <R.h>
>> #include <Rdefines.h>
>> 
>>    SEXP convolve2(SEXP a, SEXP b)
>>    {
>>        R_len_t i, j, na, nb, nab;
>>        double *xa, *xb, *xab;
>>        SEXP ab;
>> 
>>        PROTECT(a = AS_NUMERIC(a)); /* PJ wonders, doesn't this alter
>> "a"  in calling code*/
>>        PROTECT(b = AS_NUMERIC(b));
>>        na = LENGTH(a); nb = LENGTH(b); nab = na + nb - 1;
>>        PROTECT(ab = NEW_NUMERIC(nab));
>>        xa = NUMERIC_POINTER(a); xb = NUMERIC_POINTER(b);
>>        xab = NUMERIC_POINTER(ab);
>>        for(i = 0; i < nab; i++) xab[i] = 0.0;
>>        for(i = 0; i < na; i++)
>>             for(j = 0; j < nb; j++) xab[i + j] += xa[i] * xb[j];
>>        UNPROTECT(3);
>>        return(ab);
>>    }
>> 
>> 
>> Doesn't
>> 
>>       PROTECT(a = AS_NUMERIC(a));
>> 
>> have the alter the data structure "a" both inside the C function and
>> in the calling R code as well? And, if a was PROTECTED automatically,
>> could we do without that PROTECT()?
>> 
>> pj
>> 
>> -- 
>> Paul E. Johnson
>> Professor, Political Science      Assoc. Director
>> 1541 Lilac Lane, Room 504      Center for Research Methods
>> University of Kansas                 University of Kansas
>> http://pj.freefaculty.org               http://quant.ku.edu
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 



More information about the R-devel mailing list