[Rd] some questions about R internal SEXP types

Sun Sep 6 03:44:01 CEST 2020

Hello,

I am writing an R/Go interoperability tool[1] that work similarly to
Rcpp; the tool takes packages written in Go and performs the necessary
Go type analysis to wrap the Go code with C and R shims that allow the
Go code to then be called from R. The system is largely complete (with
the exception of having a clean approach to handling generalised
attributes in the easy case[2] - the less hand holding case does handle
these). Testing of some of the code is unfortunately lacking because of
the difficulties of testing across environments.

To make the system flexible I have provided an (intentionally
incomplete) Go API into the R internals which allows reasonably Go
type-safe interaction with SEXP values (Go does not have unions, so
this is uglier than it might be otherwise and unions are faked with Go
interface values). For efficiency reasons I've avoided using R internal
calls where possible (accessors are done with Go code directly, but
allocations are done in R's C code to avoid having to duplicate the
garbage collection mechanics in Go with the obvious risks of error and
possible behaviour skew in the future).

In doing this work I have some questions that I have not been able to
find answers for in the R-ints doc or hadley/r-internals.

   1. In R-ints, the LISTSXP SEXP type CDR is said to hold "usually"
      LISTSXP or NULL. What does the "usually" mean here? Is it possible
      for the CDR to hold values other than LISTSXP or NULL, and is
      this NULL NILSXP or C NULL? I assume that the CAR can hold any type
      of SEXP, is this correct?
   2. The LANGSXP and DOTSXP types are lists, but the R-ints comments on
      them do not say whether the CDR of one of these lists is the same at
      the head of the list of devolves to a LISTSXP. Looking through the
      code suggests to me that functions that allocate these two types
      allocate a LISTSXP and then change only the head of the list to be
      the LANGSXP or DOTSXP that's required, meaning that the tail of the
      list is all LISTSXP. Is this correct?

The last question is more a question of interest in design strategy,
and the answer may have been lost to time. In order to reduce the need
to go through Go's interface assertions in a number of cases I have
decided to reinterpret R_NilValue to an untyped Go nil (this is
important for example in list traversal where the CDR can (hopefully)
be only one of two types LISTSXP or NILSXP; in Go this would require a
generalised SEXP return, but by doing this reinterpretation I can
return a *List pointer which may be nil, greatly simplifying the code
and improving the performance). My question her is why a singleton null
value was chosen to be represented as a fully allocated SEXP value
rather than just a C NULL. Also, whether C NULL is used to any great
extent within the internal code. Note that the Go API provides a
mechanism to easily reconvert the nil's used back to a R_NilValue when
returning from a Go function[3].

thanks
Dan Kortschak

[1]https://github.com/rgonomic/rgo
[2]https://github.com/rgonomic/rgo/issues/1
[3]https://pkg.go.dev/github.com/rgonomic/rgo/sexp?tab=doc#Value.Export