[Rd] clarifying and adjusting the C API for R
Hiroaki Yutani
yut@n|@|n| @end|ng |rom gm@||@com
Sun Jun 9 06:29:19 CEST 2024
Thanks so much for your wonderful work, Luke!
I didn't expect such a clarification to happen this soon. This is really
great.
For convenience, I created a quick web page to search the result of
tools:::funAPI().
https://yutannihilation.github.io/R-fun-API/
Hope this helps those who are too lazy to install R-devel to check.
Best,
Yutani
2024年6月6日(木) 23:47 luke-tierney--- via R-devel <r-devel using r-project.org>:
> This is an update on some current work on the C API for use in R
> extensions.
>
> The internal R implementation makes use of tens of thousands of C
> entry points. On Linux and Windows, which support visibility
> restrictions, most of these are visible only within the R executble or
> shared library. About 1500 are not hidden and are visible to
> dynamically loaded shared libraries, such as ones in packages, and to
> embedding applications.
>
> There are two main reasons for limiting access to entry points in a
> software framework:
>
> - Some entry points are very easy to use in ways that corrupt internal
> data, leading to segfaults or, worse, incorrect computations without
> segfaults.
>
> - Some entry point expose internal structure and other implementation
> details, which makes it hard to make improvements without breaking
> client code that has come to depend on these details.
>
> The API of C entry points that can be used in R extensions, both for
> packages and embedding, has evolved organically over many years. The
> definition for the current release expressed in the Writing R
> Extensions manual (WRE) is roughly:
>
> An entry point can be used if (1) it is declared in a header file
> in R.home("include"), and (2) if it is documented for use in WRE.
>
> Ideally, (1) would be necessary and sufficient, but for a variety of
> reasons that isn't achievable, at least not in the near term. (2) can
> be challenging to determine; in particular, it is not amenable to a
> computational answer.
>
> An experimental effort is underway to add annotations to the WRE
> Texinfo source to allow (2) to be answered unambiguously. The
> annotations so far mostly reflect my reading or WRE and may be revised
> as they are reviewed by others. The annotated document can be used for
> programmatically identifying what is currently considered part of the C
> API. The result so far is an experimental function tools:::funAPI():
>
> > head(tools:::funAPI())
> name loc apitype
> 1 Rf_AdobeSymbol2utf8 R_ext/GraphicsDevice.h eapi
> 2 alloc3DArray WRE api
> 3 allocArray WRE api
> 4 allocLang WRE api
> 5 allocList WRE api
> 6 allocMatrix WRE api
>
> The 'apitype' field has three possible levels
>
> | api | stable (ideally) API |
> | eapi | experimental API |
> | emb | embedding API |
>
> Entry points in the embedded API would typically only be used in
> applications embedding R or providing new front ends, but might be
> reasonable to use in packages that support embedding.
>
> The 'loc' field indicates how the entry point is identified as part of
> an API: explicit mention in WRE, or declaration in a header file
> identified as fully part of an API.
>
> [tools:::funAPI() may not be completely accurate as it relies on
> regular expressions for examining header files considered part of the
> API rather than proper parsing. But it seems to be pretty close to
> what can be achieved with proper parsing. Proper parsing would add
> dependencies on additional tools, which I would like to avoid for
> now. One dependency already present is that a C compiler has to be on
> the search path and cc -E has to run the C pre-processor.]
>
> Two additional experimental functions are available for analyzing
> package compliance: tools:::checkPkgAPI and tools:::checkAllPkgsAPI.
> These examine installed packages.
>
> [These may produce some false positives on macOS; they may or may not
> work on Windows at this point.]
>
> Using these tools initially showed around 200 non-API entry points
> used across packages on CRAN and BIOC. Ideally this number should be
> reduced to zero. This will require a combination of additions to the
> API and changes in packages.
>
> Some entry points can safely be added to the API. Around 40 have
> already been added to WRE with API annotations; another 40 or so can
> probably be added after review.
>
> The remainder mostly fall into two groups:
>
> - Entry points that should never be used in packages, such as
> SET_OBJECT or SETLENGTH (or any non-API SETXYZ functions for that
> matter) that can create inconsistent or corrupt internal state.
>
> - Entry points that depend on the existence of internal structure that
> might be subject to change, such as the existence of promise objects
> or internal structure of environments.
>
> Many, if not most, of these seem to be used in idioms that can either
> be accomplished with existing higher-level functions already in the
> API, or by new higher level functions that can be created and
> added. Working through these will take some time and coordination
> between R-core and maintainers of affected packages.
>
> Once things have gelled a bit more I hope to turn this into a blog
> post that will include some examples of moving non-API entry point
> uses into compliance.
>
> Best,
>
> luke
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa Phone: 319-335-3386
> Department of Statistics and Fax: 319-335-3017
> Actuarial Science
> 241 Schaeffer Hall email: luke-tierney using uiowa.edu
> Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list