[Rd] clarifying and adjusting the C API for R

Travers Ching tr@ver@c @end|ng |rom gm@||@com
Sun Jun 9 21:25:46 CEST 2024


Hi Luke, thanks for all your work on R!

I'd like to ask specifically about R_serialize / R_unserialize (and
associated helper functions). These are used by at least a handful of
packages and I don't see them in the list from Yutani.

Are these API functions considered "stable"?

Best,
Travers

On Sat, Jun 8, 2024 at 9:29 PM Hiroaki Yutani <yutani.ini using gmail.com> wrote:
>
> Thanks so much for your wonderful work, Luke!
> I didn't expect such a clarification to happen this soon. This is really
> great.
>
> For convenience, I created a quick web page to search the result of
> tools:::funAPI().
>
> https://yutannihilation.github.io/R-fun-API/
>
> Hope this helps those who are too lazy to install R-devel to check.
>
> Best,
> Yutani
>
> 2024年6月6日(木) 23:47 luke-tierney--- via R-devel <r-devel using r-project.org>:
>
> > This is an update on some current work on the C API for use in R
> > extensions.
> >
> > The internal R implementation makes use of tens of thousands of C
> > entry points. On Linux and Windows, which support visibility
> > restrictions, most of these are visible only within the R executble or
> > shared library. About 1500 are not hidden and are visible to
> > dynamically loaded shared libraries, such as ones in packages, and to
> > embedding applications.
> >
> > There are two main reasons for limiting access to entry points in a
> > software framework:
> >
> > - Some entry points are very easy to use in ways that corrupt internal
> >    data, leading to segfaults or, worse, incorrect computations without
> >    segfaults.
> >
> > - Some entry point expose internal structure and other implementation
> >    details, which makes it hard to make improvements without breaking
> >    client code that has come to depend on these details.
> >
> > The API of C entry points that can be used in R extensions, both for
> > packages and embedding, has evolved organically over many years. The
> > definition for the current release expressed in the Writing R
> > Extensions manual (WRE) is roughly:
> >
> >      An entry point can be used if (1) it is declared in a header file
> >      in R.home("include"), and (2) if it is documented for use in WRE.
> >
> > Ideally, (1) would be necessary and sufficient, but for a variety of
> > reasons that isn't achievable, at least not in the near term. (2) can
> > be challenging to determine; in particular, it is not amenable to a
> > computational answer.
> >
> > An experimental effort is underway to add annotations to the WRE
> > Texinfo source to allow (2) to be answered unambiguously. The
> > annotations so far mostly reflect my reading or WRE and may be revised
> > as they are reviewed by others. The annotated document can be used for
> > programmatically identifying what is currently considered part of the C
> > API. The result so far is an experimental function tools:::funAPI():
> >
> >      > head(tools:::funAPI())
> >                      name                    loc apitype
> >      1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
> >      2        alloc3DArray                    WRE     api
> >      3          allocArray                    WRE     api
> >      4           allocLang                    WRE     api
> >      5           allocList                    WRE     api
> >      6         allocMatrix                    WRE     api
> >
> > The 'apitype' field has three possible levels
> >
> >      | api  | stable (ideally) API |
> >      | eapi | experimental API     |
> >      | emb  | embedding API        |
> >
> > Entry points in the embedded API would typically only be used in
> > applications embedding R or providing new front ends, but might be
> > reasonable to use in packages that support embedding.
> >
> > The 'loc' field indicates how the entry point is identified as part of
> > an API: explicit mention in WRE, or declaration in a header file
> > identified as fully part of an API.
> >
> > [tools:::funAPI() may not be completely accurate as it relies on
> > regular expressions for examining header files considered part of the
> > API rather than proper parsing. But it seems to be pretty close to
> > what can be achieved with proper parsing.  Proper parsing would add
> > dependencies on additional tools, which I would like to avoid for
> > now. One dependency already present is that a C compiler has to be on
> > the search path and cc -E has to run the C pre-processor.]
> >
> > Two additional experimental functions are available for analyzing
> > package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI.
> > These examine installed packages.
> >
> > [These may produce some false positives on macOS; they may or may not
> > work on Windows at this point.]
> >
> > Using these tools initially showed around 200 non-API entry points
> > used across packages on CRAN and BIOC. Ideally this number should be
> > reduced to zero. This will require a combination of additions to the
> > API and changes in packages.
> >
> > Some entry points can safely be added to the API. Around 40 have
> > already been added to WRE with API annotations; another 40 or so can
> > probably be added after review.
> >
> > The remainder mostly fall into two groups:
> >
> > - Entry points that should never be used in packages, such as
> >    SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for that
> >    matter) that can create inconsistent or corrupt internal state.
> >
> > - Entry points that depend on the existence of internal structure that
> >    might be subject to change, such as the existence of promise objects
> >    or internal structure of environments.
> >
> > Many, if not most, of these seem to be used in idioms that can either
> > be accomplished with existing higher-level functions already in the
> > API, or by new higher level functions that can be created and
> > added. Working through these will take some time and coordination
> > between R-core and maintainers of affected packages.
> >
> > Once things have gelled a bit more I hope to turn this into a blog
> > post that will include some examples of moving non-API entry point
> > uses into compliance.
> >
> > Best,
> >
> > luke
> >
> > --
> > Luke Tierney
> > Ralph E. Wareham Professor of Mathematical Sciences
> > University of Iowa                  Phone:             319-335-3386
> > Department of Statistics and        Fax:               319-335-3017
> >     Actuarial Science
> > 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> > Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list