[Rd] [External] Re: clarifying and adjusting the C API for R

iuke-tier@ey m@iii@g oii uiow@@edu iuke-tier@ey m@iii@g oii uiow@@edu
Sat Jun 8 02:52:43 CEST 2024


On Sat, 8 Jun 2024, Reed A. Cartwright wrote:

> [You don't often get email from racartwright using gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Would it be reasonable to move the non-API stuff that cannot be hidden
> into header files inside a "details" directory (or some other specific
> naming scheme)?
>
> That's what I use when I need to separate a public API from an internal API.

As do I, as does everyone else. As I wrote originally: " ... for a
variety of reasons that isn't achievable, at least not in the near
term." Can we leave it at that please?

luke

>
> On Fri, Jun 7, 2024 at 7:30 AM luke-tierney--- via R-devel
> <r-devel using r-project.org> wrote:
>>
>> On Fri, 7 Jun 2024, Steven Dirkse wrote:
>>
>>> You don't often get email from sdirkse using gams.com. Learn why this is important
>>> Thanks for sharing this overview of an interesting and much-needed project.
>>> You mention that R exports about 1500 symbols (on platforms supporting
>>> visibility) but this subject isn't mentioned explicitly again in your note,
>>> so I'm wondering how things tie together.  Un-exported symbols cannot be
>>> part of the API - how would people use them in this case?  In a perfect
>>> world the set of exported symbols could define the API or match it exactly,
>>> but I guess that isn't the case at present.  So I conclude that R exports
>>> extra (i.e. non-API) symbols.  Is part of the goal to remove these extra
>>> exports?
>>
>> No. We'll hide what we can, but base packages for one need access to
>> some entry points that should not be in the API, so those have to stay
>> un-hidden.
>>
>> Best,
>>
>> luke
>>
>>>
>>> -Steve
>>>
>>> On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
>>> <r-devel using r-project.org> wrote:
>>>       This is an update on some current work on the C API for use in R
>>>       extensions.
>>>
>>>       The internal R implementation makes use of tens of thousands of
>>>       C
>>>       entry points. On Linux and Windows, which support visibility
>>>       restrictions, most of these are visible only within the R
>>>       executble or
>>>       shared library. About 1500 are not hidden and are visible to
>>>       dynamically loaded shared libraries, such as ones in packages,
>>>       and to
>>>       embedding applications.
>>>
>>>       There are two main reasons for limiting access to entry points
>>>       in a
>>>       software framework:
>>>
>>>       - Some entry points are very easy to use in ways that corrupt
>>>       internal
>>>          data, leading to segfaults or, worse, incorrect computations
>>>       without
>>>          segfaults.
>>>
>>>       - Some entry point expose internal structure and other
>>>       implementation
>>>          details, which makes it hard to make improvements without
>>>       breaking
>>>          client code that has come to depend on these details.
>>>
>>>       The API of C entry points that can be used in R extensions, both
>>>       for
>>>       packages and embedding, has evolved organically over many years.
>>>       The
>>>       definition for the current release expressed in the Writing R
>>>       Extensions manual (WRE) is roughly:
>>>
>>>            An entry point can be used if (1) it is declared in a
>>>       header file
>>>            in R.home("include"), and (2) if it is documented for use
>>>       in WRE.
>>>
>>>       Ideally, (1) would be necessary and sufficient, but for a
>>>       variety of
>>>       reasons that isn't achievable, at least not in the near term.
>>>       (2) can
>>>       be challenging to determine; in particular, it is not amenable
>>>       to a
>>>       computational answer.
>>>
>>>       An experimental effort is underway to add annotations to the WRE
>>>       Texinfo source to allow (2) to be answered unambiguously. The
>>>       annotations so far mostly reflect my reading or WRE and may be
>>>       revised
>>>       as they are reviewed by others. The annotated document can be
>>>       used for
>>>       programmatically identifying what is currently considered part
>>>       of the C
>>>       API. The result so far is an experimental function
>>>       tools:::funAPI():
>>>
>>>           > head(tools:::funAPI())
>>>                            name                    loc apitype
>>>            1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h    eapi
>>>            2        alloc3DArray                    WRE     api
>>>            3          allocArray                    WRE     api
>>>            4           allocLang                    WRE     api
>>>            5           allocList                    WRE     api
>>>            6         allocMatrix                    WRE     api
>>>
>>>       The 'apitype' field has three possible levels
>>>
>>>            | api  | stable (ideally) API |
>>>            | eapi | experimental API     |
>>>            | emb  | embedding API        |
>>>
>>>       Entry points in the embedded API would typically only be used in
>>>       applications embedding R or providing new front ends, but might
>>>       be
>>>       reasonable to use in packages that support embedding.
>>>
>>>       The 'loc' field indicates how the entry point is identified as
>>>       part of
>>>       an API: explicit mention in WRE, or declaration in a header file
>>>       identified as fully part of an API.
>>>
>>>       [tools:::funAPI() may not be completely accurate as it relies on
>>>       regular expressions for examining header files considered part
>>>       of the
>>>       API rather than proper parsing. But it seems to be pretty close
>>>       to
>>>       what can be achieved with proper parsing.  Proper parsing would
>>>       add
>>>       dependencies on additional tools, which I would like to avoid
>>>       for
>>>       now. One dependency already present is that a C compiler has to
>>>       be on
>>>       the search path and cc -E has to run the C pre-processor.]
>>>
>>>       Two additional experimental functions are available for
>>>       analyzing
>>>       package compliance: tools:::checkPkgAPI and
>>>       tools:::checkAllPkgsAPI.
>>>       These examine installed packages.
>>>
>>>       [These may produce some false positives on macOS; they may or
>>>       may not
>>>       work on Windows at this point.]
>>>
>>>       Using these tools initially showed around 200 non-API entry
>>>       points
>>>       used across packages on CRAN and BIOC. Ideally this number
>>>       should be
>>>       reduced to zero. This will require a combination of additions to
>>>       the
>>>       API and changes in packages.
>>>
>>>       Some entry points can safely be added to the API. Around 40 have
>>>       already been added to WRE with API annotations; another 40 or so
>>>       can
>>>       probably be added after review.
>>>
>>>       The remainder mostly fall into two groups:
>>>
>>>       - Entry points that should never be used in packages, such as
>>>          SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for
>>>       that
>>>          matter) that can create inconsistent or corrupt internal
>>>       state.
>>>
>>>       - Entry points that depend on the existence of internal
>>>       structure that
>>>          might be subject to change, such as the existence of promise
>>>       objects
>>>          or internal structure of environments.
>>>
>>>       Many, if not most, of these seem to be used in idioms that can
>>>       either
>>>       be accomplished with existing higher-level functions already in
>>>       the
>>>       API, or by new higher level functions that can be created and
>>>       added. Working through these will take some time and
>>>       coordination
>>>       between R-core and maintainers of affected packages.
>>>
>>>       Once things have gelled a bit more I hope to turn this into a
>>>       blog
>>>       post that will include some examples of moving non-API entry
>>>       point
>>>       uses into compliance.
>>>
>>>       Best,
>>>
>>>       luke
>>>
>>>       --
>>>       Luke Tierney
>>>       Ralph E. Wareham Professor of Mathematical Sciences
>>>       University of Iowa                  Phone:
>>>        319-335-3386
>>>       Department of Statistics and        Fax:
>>>        319-335-3017
>>>           Actuarial Science
>>>       241 Schaeffer Hall                  email:
>>>        luke-tierney using uiowa.edu
>>>       Iowa City, IA 52242                 WWW:
>>>       https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$
>>>
>>>       ______________________________________________
>>>       R-devel using r-project.org mailing list
>>>       https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>     Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
>> Iowa City, IA 52242                 WWW:  https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$
>

-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney using uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu/


More information about the R-devel mailing list