[Rd] [External] Re: clarifying and adjusting the C API for R
iuke-tier@ey m@iii@g oii uiow@@edu
iuke-tier@ey m@iii@g oii uiow@@edu
Sat Jun 8 02:52:43 CEST 2024
On Sat, 8 Jun 2024, Reed A. Cartwright wrote:
> [You don't often get email from racartwright using gmail.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Would it be reasonable to move the non-API stuff that cannot be hidden
> into header files inside a "details" directory (or some other specific
> naming scheme)?
>
> That's what I use when I need to separate a public API from an internal API.
As do I, as does everyone else. As I wrote originally: " ... for a
variety of reasons that isn't achievable, at least not in the near
term." Can we leave it at that please?
luke
>
> On Fri, Jun 7, 2024 at 7:30 AM luke-tierney--- via R-devel
> <r-devel using r-project.org> wrote:
>>
>> On Fri, 7 Jun 2024, Steven Dirkse wrote:
>>
>>> You don't often get email from sdirkse using gams.com. Learn why this is important
>>> Thanks for sharing this overview of an interesting and much-needed project.
>>> You mention that R exports about 1500 symbols (on platforms supporting
>>> visibility) but this subject isn't mentioned explicitly again in your note,
>>> so I'm wondering how things tie together. Un-exported symbols cannot be
>>> part of the API - how would people use them in this case? In a perfect
>>> world the set of exported symbols could define the API or match it exactly,
>>> but I guess that isn't the case at present. So I conclude that R exports
>>> extra (i.e. non-API) symbols. Is part of the goal to remove these extra
>>> exports?
>>
>> No. We'll hide what we can, but base packages for one need access to
>> some entry points that should not be in the API, so those have to stay
>> un-hidden.
>>
>> Best,
>>
>> luke
>>
>>>
>>> -Steve
>>>
>>> On Thu, Jun 6, 2024 at 10:47 AM luke-tierney--- via R-devel
>>> <r-devel using r-project.org> wrote:
>>> This is an update on some current work on the C API for use in R
>>> extensions.
>>>
>>> The internal R implementation makes use of tens of thousands of
>>> C
>>> entry points. On Linux and Windows, which support visibility
>>> restrictions, most of these are visible only within the R
>>> executble or
>>> shared library. About 1500 are not hidden and are visible to
>>> dynamically loaded shared libraries, such as ones in packages,
>>> and to
>>> embedding applications.
>>>
>>> There are two main reasons for limiting access to entry points
>>> in a
>>> software framework:
>>>
>>> - Some entry points are very easy to use in ways that corrupt
>>> internal
>>> data, leading to segfaults or, worse, incorrect computations
>>> without
>>> segfaults.
>>>
>>> - Some entry point expose internal structure and other
>>> implementation
>>> details, which makes it hard to make improvements without
>>> breaking
>>> client code that has come to depend on these details.
>>>
>>> The API of C entry points that can be used in R extensions, both
>>> for
>>> packages and embedding, has evolved organically over many years.
>>> The
>>> definition for the current release expressed in the Writing R
>>> Extensions manual (WRE) is roughly:
>>>
>>> An entry point can be used if (1) it is declared in a
>>> header file
>>> in R.home("include"), and (2) if it is documented for use
>>> in WRE.
>>>
>>> Ideally, (1) would be necessary and sufficient, but for a
>>> variety of
>>> reasons that isn't achievable, at least not in the near term.
>>> (2) can
>>> be challenging to determine; in particular, it is not amenable
>>> to a
>>> computational answer.
>>>
>>> An experimental effort is underway to add annotations to the WRE
>>> Texinfo source to allow (2) to be answered unambiguously. The
>>> annotations so far mostly reflect my reading or WRE and may be
>>> revised
>>> as they are reviewed by others. The annotated document can be
>>> used for
>>> programmatically identifying what is currently considered part
>>> of the C
>>> API. The result so far is an experimental function
>>> tools:::funAPI():
>>>
>>> > head(tools:::funAPI())
>>> name loc apitype
>>> 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h eapi
>>> 2 alloc3DArray WRE api
>>> 3 allocArray WRE api
>>> 4 allocLang WRE api
>>> 5 allocList WRE api
>>> 6 allocMatrix WRE api
>>>
>>> The 'apitype' field has three possible levels
>>>
>>> | api | stable (ideally) API |
>>> | eapi | experimental API |
>>> | emb | embedding API |
>>>
>>> Entry points in the embedded API would typically only be used in
>>> applications embedding R or providing new front ends, but might
>>> be
>>> reasonable to use in packages that support embedding.
>>>
>>> The 'loc' field indicates how the entry point is identified as
>>> part of
>>> an API: explicit mention in WRE, or declaration in a header file
>>> identified as fully part of an API.
>>>
>>> [tools:::funAPI() may not be completely accurate as it relies on
>>> regular expressions for examining header files considered part
>>> of the
>>> API rather than proper parsing. But it seems to be pretty close
>>> to
>>> what can be achieved with proper parsing. Proper parsing would
>>> add
>>> dependencies on additional tools, which I would like to avoid
>>> for
>>> now. One dependency already present is that a C compiler has to
>>> be on
>>> the search path and cc -E has to run the C pre-processor.]
>>>
>>> Two additional experimental functions are available for
>>> analyzing
>>> package compliance: tools:::checkPkgAPI and
>>> tools:::checkAllPkgsAPI.
>>> These examine installed packages.
>>>
>>> [These may produce some false positives on macOS; they may or
>>> may not
>>> work on Windows at this point.]
>>>
>>> Using these tools initially showed around 200 non-API entry
>>> points
>>> used across packages on CRAN and BIOC. Ideally this number
>>> should be
>>> reduced to zero. This will require a combination of additions to
>>> the
>>> API and changes in packages.
>>>
>>> Some entry points can safely be added to the API. Around 40 have
>>> already been added to WRE with API annotations; another 40 or so
>>> can
>>> probably be added after review.
>>>
>>> The remainder mostly fall into two groups:
>>>
>>> - Entry points that should never be used in packages, such as
>>> SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for
>>> that
>>> matter) that can create inconsistent or corrupt internal
>>> state.
>>>
>>> - Entry points that depend on the existence of internal
>>> structure that
>>> might be subject to change, such as the existence of promise
>>> objects
>>> or internal structure of environments.
>>>
>>> Many, if not most, of these seem to be used in idioms that can
>>> either
>>> be accomplished with existing higher-level functions already in
>>> the
>>> API, or by new higher level functions that can be created and
>>> added. Working through these will take some time and
>>> coordination
>>> between R-core and maintainers of affected packages.
>>>
>>> Once things have gelled a bit more I hope to turn this into a
>>> blog
>>> post that will include some examples of moving non-API entry
>>> point
>>> uses into compliance.
>>>
>>> Best,
>>>
>>> luke
>>>
>>> --
>>> Luke Tierney
>>> Ralph E. Wareham Professor of Mathematical Sciences
>>> University of Iowa Phone:
>>> 319-335-3386
>>> Department of Statistics and Fax:
>>> 319-335-3017
>>> Actuarial Science
>>> 241 Schaeffer Hall email:
>>> luke-tierney using uiowa.edu
>>> Iowa City, IA 52242 WWW:
>>> https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$
>>>
>>> ______________________________________________
>>> R-devel using r-project.org mailing list
>>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa Phone: 319-335-3386
>> Department of Statistics and Fax: 319-335-3017
>> Actuarial Science
>> 241 Schaeffer Hall email: luke-tierney using uiowa.edu
>> Iowa City, IA 52242 WWW: https://urldefense.com/v3/__http://www.stat.uiowa.edu__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZBvMvo18$
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-devel__;!!IKRxdwAv5BmarQ!foNGcMBk1Ky20Cgz66006bUDTWTxmZhh2ntk8-PLXUqCy2s6xw68UOo-fy7OsIRpHBwgMtfQyBkcYZUZnVX5taE$
>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney using uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu/
More information about the R-devel
mailing list