[Rd] [patch] Behavior of .C() and .Fortran() when given double(0) or integer(0).

Pavel N. Krivitsky krivitsky at stat.psu.edu
Sun May 6 21:54:13 CEST 2012


Oops... Forgot to attach the dotC_NULL.c, the C source file for the test
case.

                  Pavel Krivitsky

On Fri, 2012-05-04 at 13:42 -0400, Pavel N. Krivitsky wrote:
> Dear R-devel,
> 
> While tracking down some hard-to-reproduce bugs in a package I maintain,
> I stumbled on a behavior change between R 2.15.0 and the current R-devel
> (or SVN trunk).
> 
> In 2.15.0 and earlier, if you passed an 0-length vector of the right
> mode (e.g., double(0) or integer(0)) as one of the arguments in a .C()
> call with DUP=TRUE (the default), the C routine would be passed NULL
> (the C pointer, not R NULL) in the corresponding argument. The current
> development version instead passes it a pointer to what appears to be
> memory location immediately following the the SEXP that holds the
> metadata for the argument. If the argument has length 0, this is often
> memory belonging to a different R object. (DUP=FALSE in 2.15.0
> appears to have the same behavior as R-devel.)
> 
> .C() documentation and Writing R Extensions don't explicitly specify a
> behavior for 0-length vectors, so I don't know if this change is
> intentional, or whether it was a side-effect of the following news item:
> 
>       .C() and .Fortran() do less copying: arguments which are raw,
>       logical, integer, real or complex vectors and are unnamed are not
>       copied before the call, and (named or not) are not copied after
>       the call.  Lists are no longer copied (they are supposed to be
>       used read-only in the C code).
> 
> Was the change in the empty vector behavior intentional?
> 
> It seems to me that standardizing on the behavior of giving the C
> routine NULL is safer, more consistent with other memory-related
> routines, and more convenient: whereas dereferencing a NULL pointer is
> an immediate (and therefore easily traced) segfault, dereferencing an
> invalid pointer that is nevertheless in the general memory area
> allocated to the program often causes subtle errors down the line;
> R_alloc asked to allocate 0 bytes returns NULL, at least on my platform;
> and the C routine can easily check if a pointer is NULL, but with the
> R-devel behavior, the programmer has to add an explicit way of telling
> that an empty vector was passed.
> 
> I've attached a small test case (dotC_NULL.* files) that shows the
> difference. The C file should be built with R CMD SHLIB, and the R file
> calls the functions in the library with a variety of arguments. Output I
> get from running
> R CMD BATCH --no-timing --vanilla --slave dotC_NULL.R
> on R 2.15.0, R trunk, and R trunk with my patch (described below) are attached.
> 
> The attached patch (dotC_NULL.patch) against the current trunk
> (affecting src/main/dotcode.c) restores the old behavior for DUP=TRUE
> (i.e., 0-length vector -> NULL pointer) and extends it to the DUP=FALSE
> case. It does so by checking if an argument --- if it's of mode raw,
> integer, real, or complex --- to a .C() or .Fortran() call has length 0,
> and, if so, sets the pointer to be passed to NULL and then skips the
> copying of the C routine's changes back to the R object for that
> argument. The additional computing cost should be negligible (i.e.,
> checking if vector length equals 0 and break-ing out of a switch
> statement if so).
> 
> The patch appears to work, at least for my package, and R CMD check
> passes for all recommended packages (on my 64-bit Linux system), but
> this is my first time working with R's internals, so handle with care.
> 
>                                    Best,
>                                    Pavel Krivitsky
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list