[Rd] [External] Re: clarifying and adjusting the C API for R

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Fri Jun 7 16:30:24 CEST 2024


On Fri, 7 Jun 2024, Steven Dirkse wrote:

> You don't often get email from sdirkse using gams.com. Learn why this is important
> Thanks for sharing this overview of an interesting and much-needed project.
> You mention that R exports about 1500 symbols (on platforms supporting
> visibility) but this subject isn't mentioned explicitly again in your note,
> so I'm wondering how things tie together.  Un-exported symbols cannot be
> part of the API - how would people use them in this case?  In a perfect
> world the set of exported symbols could define the API or match it exactly,
> but I guess that isn't the case at present.  So I conclude that R exports
> extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
> exports?

No. We'll hide what we can, but base packages for one need access to
some entry points that should not be in the API, so those have to stay
un-hidden.

Best,

luke

> 
> -Steve
> 
> On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
> <r-devel using r-project.org> wrote:
>       This is an update on some current work on the C API for use in R
>       extensions.
>
>       The internal R implementation makes use of tens of thousands of
>       C
>       entry points. On Linux and Windows, which support visibility
>       restrictions, most of these are visible only within the R
>       executble or
>       shared library. About 1500 are not hidden and are visible to
>       dynamically loaded shared libraries, such as ones in packages,
>       and to
>       embedding applications.
>
>       There are two main reasons for limiting access to entry points
>       in a
>       software framework:
>
>       - Some entry points are very easy to use in ways that corrupt
>       internal
>          data, leading to segfaults or, worse, incorrect computations
>       without
>          segfaults.
>
>       - Some entry point expose internal structure and other
>       implementation
>          details, which makes it hard to make improvements without
>       breaking
>          client code that has come to depend on these details.
>
>       The API of C entry points that can be used in R extensions, both
>       for
>       packages and embedding, has evolved organically over many years.
>       The
>       definition for the current release expressed in the Writing R
>       Extensions manual (WRE) is roughly:
>
>            An entry point can be used if (1) it is declared in a
>       header file
>            in R.home("include"), and (2) if it is documented for use
>       in WRE.
>
>       Ideally, (1) would be necessary and sufficient, but for a
>       variety of
>       reasons that isn't achievable, at least not in the near term.
>       (2) can
>       be challenging to determine; in particular, it is not amenable
>       to a
>       computational answer.
>
>       An experimental effort is underway to add annotations to the WRE
>       Texinfo source to allow (2) to be answered unambiguously. The
>       annotations so far mostly reflect my reading or WRE and may be
>       revised
>       as they are reviewed by others. The annotated document can be
>       used for
>       programmatically identifying what is currently considered part
>       of the C
>       API. The result so far is an experimental function
>       tools:::funAPI():
>
>            > head(tools:::funAPI())
>                            name                    loc apitype
>            1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
>            2        alloc3DArray                    WRE     api
>            3          allocArray                    WRE     api
>            4           allocLang                    WRE     api
>            5           allocList                    WRE     api
>            6         allocMatrix                    WRE     api
>
>       The 'apitype' field has three possible levels
>
>            | api  | stable (ideally) API |
>            | eapi | experimental API     |
>            | emb  | embedding API        |
>
>       Entry points in the embedded API would typically only be used in
>       applications embedding R or providing new front ends, but might
>       be
>       reasonable to use in packages that support embedding.
>
>       The 'loc' field indicates how the entry point is identified as
>       part of
>       an API: explicit mention in WRE, or declaration in a header file
>       identified as fully part of an API.
>
>       [tools:::funAPI() may not be completely accurate as it relies on
>       regular expressions for examining header files considered part
>       of the
>       API rather than proper parsing. But it seems to be pretty close
>       to
>       what can be achieved with proper parsing.  Proper parsing would
>       add
>       dependencies on additional tools, which I would like to avoid
>       for
>       now. One dependency already present is that a C compiler has to
>       be on
>       the search path and cc -E has to run the C pre-processor.]
>
>       Two additional experimental functions are available for
>       analyzing
>       package compliance: tools:::checkPkgAPI and
>       tools:::checkAllPkgsAPI.
>       These examine installed packages.
>
>       [These may produce some false positives on macOS; they may or
>       may not
>       work on Windows at this point.]
>
>       Using these tools initially showed around 200 non-API entry
>       points
>       used across packages on CRAN and BIOC. Ideally this number
>       should be
>       reduced to zero. This will require a combination of additions to
>       the
>       API and changes in packages.
>
>       Some entry points can safely be added to the API. Around 40 have
>       already been added to WRE with API annotations; another 40 or so
>       can
>       probably be added after review.
>
>       The remainder mostly fall into two groups:
>
>       - Entry points that should never be used in packages, such as
>          SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for
>       that
>          matter) that can create inconsistent or corrupt internal
>       state.
>
>       - Entry points that depend on the existence of internal
>       structure that
>          might be subject to change, such as the existence of promise
>       objects
>          or internal structure of environments.
>
>       Many, if not most, of these seem to be used in idioms that can
>       either
>       be accomplished with existing higher-level functions already in
>       the
>       API, or by new higher level functions that can be created and
>       added. Working through these will take some time and
>       coordination
>       between R-core and maintainers of affected packages.
>
>       Once things have gelled a bit more I hope to turn this into a
>       blog
>       post that will include some examples of moving non-API entry
>       point
>       uses into compliance.
>
>       Best,
>
>       luke
>
>       --
>       Luke Tierney
>       Ralph E. Wareham Professor of Mathematical Sciences
>       University of Iowa                  Phone:           
>        319-335-3386
>       Department of Statistics and        Fax:             
>        319-335-3017
>           Actuarial Science
>       241 Schaeffer Hall                  email: 
>        luke-tierney using uiowa.edu
>       Iowa City, IA 52242                 WWW: 
>       http://www.stat.uiowa.edu
>
>       ______________________________________________
>       R-devel using r-project.org mailing list
>       https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> 
> 
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


More information about the R-devel mailing list