[Rd] [External] Re: clarifying and adjusting the C API for R

Reed A. Cartwright r@c@rtwr|ght @end|ng |rom gm@||@com
Sat Jun 8 02:06:46 CEST 2024


Would it be reasonable to move the non-API stuff that cannot be hidden
into header files inside a "details" directory (or some other specific
naming scheme)?

That's what I use when I need to separate a public API from an internal API.


On Fri, Jun 7, 2024 at 7:30 AM luke-tierney--- via R-devel
<r-devel using r-project.org> wrote:
>
> On Fri, 7 Jun 2024, Steven Dirkse wrote:
>
> > You don't often get email from sdirkse using gams.com. Learn why this is important
> > Thanks for sharing this overview of an interesting and much-needed project.
> > You mention that R exports about 1500 symbols (on platforms supporting
> > visibility) but this subject isn't mentioned explicitly again in your note,
> > so I'm wondering how things tie together.  Un-exported symbols cannot be
> > part of the API - how would people use them in this case?  In a perfect
> > world the set of exported symbols could define the API or match it exactly,
> > but I guess that isn't the case at present.  So I conclude that R exports
> > extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
> > exports?
>
> No. We'll hide what we can, but base packages for one need access to
> some entry points that should not be in the API, so those have to stay
> un-hidden.
>
> Best,
>
> luke
>
> >
> > -Steve
> >
> > On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
> > <r-devel using r-project.org> wrote:
> >       This is an update on some current work on the C API for use in R
> >       extensions.
> >
> >       The internal R implementation makes use of tens of thousands of
> >       C
> >       entry points. On Linux and Windows, which support visibility
> >       restrictions, most of these are visible only within the R
> >       executble or
> >       shared library. About 1500 are not hidden and are visible to
> >       dynamically loaded shared libraries, such as ones in packages,
> >       and to
> >       embedding applications.
> >
> >       There are two main reasons for limiting access to entry points
> >       in a
> >       software framework:
> >
> >       - Some entry points are very easy to use in ways that corrupt
> >       internal
> >          data, leading to segfaults or, worse, incorrect computations
> >       without
> >          segfaults.
> >
> >       - Some entry point expose internal structure and other
> >       implementation
> >          details, which makes it hard to make improvements without
> >       breaking
> >          client code that has come to depend on these details.
> >
> >       The API of C entry points that can be used in R extensions, both
> >       for
> >       packages and embedding, has evolved organically over many years.
> >       The
> >       definition for the current release expressed in the Writing R
> >       Extensions manual (WRE) is roughly:
> >
> >            An entry point can be used if (1) it is declared in a
> >       header file
> >            in R.home("include"), and (2) if it is documented for use
> >       in WRE.
> >
> >       Ideally, (1) would be necessary and sufficient, but for a
> >       variety of
> >       reasons that isn't achievable, at least not in the near term.
> >       (2) can
> >       be challenging to determine; in particular, it is not amenable
> >       to a
> >       computational answer.
> >
> >       An experimental effort is underway to add annotations to the WRE
> >       Texinfo source to allow (2) to be answered unambiguously. The
> >       annotations so far mostly reflect my reading or WRE and may be
> >       revised
> >       as they are reviewed by others. The annotated document can be
> >       used for
> >       programmatically identifying what is currently considered part
> >       of the C
> >       API. The result so far is an experimental function
> >       tools:::funAPI():
> >
> >            > head(tools:::funAPI())
> >                            name                    loc apitype
> >            1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
> >            2        alloc3DArray                    WRE     api
> >            3          allocArray                    WRE     api
> >            4           allocLang                    WRE     api
> >            5           allocList                    WRE     api
> >            6         allocMatrix                    WRE     api
> >
> >       The 'apitype' field has three possible levels
> >
> >            | api  | stable (ideally) API |
> >            | eapi | experimental API     |
> >            | emb  | embedding API        |
> >
> >       Entry points in the embedded API would typically only be used in
> >       applications embedding R or providing new front ends, but might
> >       be
> >       reasonable to use in packages that support embedding.
> >
> >       The 'loc' field indicates how the entry point is identified as
> >       part of
> >       an API: explicit mention in WRE, or declaration in a header file
> >       identified as fully part of an API.
> >
> >       [tools:::funAPI() may not be completely accurate as it relies on
> >       regular expressions for examining header files considered part
> >       of the
> >       API rather than proper parsing. But it seems to be pretty close
> >       to
> >       what can be achieved with proper parsing.  Proper parsing would
> >       add
> >       dependencies on additional tools, which I would like to avoid
> >       for
> >       now. One dependency already present is that a C compiler has to
> >       be on
> >       the search path and cc -E has to run the C pre-processor.]
> >
> >       Two additional experimental functions are available for
> >       analyzing
> >       package compliance: tools:::checkPkgAPI and
> >       tools:::checkAllPkgsAPI.
> >       These examine installed packages.
> >
> >       [These may produce some false positives on macOS; they may or
> >       may not
> >       work on Windows at this point.]
> >
> >       Using these tools initially showed around 200 non-API entry
> >       points
> >       used across packages on CRAN and BIOC. Ideally this number
> >       should be
> >       reduced to zero. This will require a combination of additions to
> >       the
> >       API and changes in packages.
> >
> >       Some entry points can safely be added to the API. Around 40 have
> >       already been added to WRE with API annotations; another 40 or so
> >       can
> >       probably be added after review.
> >
> >       The remainder mostly fall into two groups:
> >
> >       - Entry points that should never be used in packages, such as
> >          SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for
> >       that
> >          matter) that can create inconsistent or corrupt internal
> >       state.
> >
> >       - Entry points that depend on the existence of internal
> >       structure that
> >          might be subject to change, such as the existence of promise
> >       objects
> >          or internal structure of environments.
> >
> >       Many, if not most, of these seem to be used in idioms that can
> >       either
> >       be accomplished with existing higher-level functions already in
> >       the
> >       API, or by new higher level functions that can be created and
> >       added. Working through these will take some time and
> >       coordination
> >       between R-core and maintainers of affected packages.
> >
> >       Once things have gelled a bit more I hope to turn this into a
> >       blog
> >       post that will include some examples of moving non-API entry
> >       point
> >       uses into compliance.
> >
> >       Best,
> >
> >       luke
> >
> >       --
> >       Luke Tierney
> >       Ralph E. Wareham Professor of Mathematical Sciences
> >       University of Iowa                  Phone:
> >        319-335-3386
> >       Department of Statistics and        Fax:
> >        319-335-3017
> >           Actuarial Science
> >       241 Schaeffer Hall                  email:
> >        luke-tierney using uiowa.edu
> >       Iowa City, IA 52242                 WWW:
> >       https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$
> >
> >       ______________________________________________
> >       R-devel using r-project.org mailing list
> >       https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$
> >
> >
> >
> >
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>     Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
> Iowa City, IA 52242                 WWW:  https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$
> ______________________________________________
> R-devel using r-project.org mailing list
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$



More information about the R-devel mailing list