[R-pkg-devel] Convention or standards for using header library (e.g. Eigen)

Dirk Eddelbuettel edd @end|ng |rom deb|@n@org
Sat Jun 24 19:44:40 CEST 2023


On 24 June 2023 at 21:35, Stephen Wade wrote:
| Doesnt seem like the system package is worth it. Should the convention
| simply be to bundle the headers in the package then? What about package
| size - is there some limit to the size of included libraries/headers to
| consider for CRAN?

Here is one (drastic) example:

  $ du -csh /usr/local/lib/R/site-library/BH
  156M    /usr/local/lib/R/site-library/BH
  156M    total
  $ 

Note that the package was smaller when it started (in 2013). (Note that the
last time I checked its size, the largest (not just headers) package I know
of on CRAN still was about twice as large still.)

Anyway: as you are starting to see, this is a somewhat complex problem.
Header packages are one approach. _Writing R Extensions_ mentions pure header
packages and name-checks my packages BH, RcppArmadillo and RcppEigen in
Section 1.1.3. I once wrote a short paper on this [1] (also a vignette [2])
where I more or less recommend header packages because compiled ones are so
much harder.  Recognise for example that a) no cross-OS way to check for
packages exists (though pkg-config comes close), b) no general package
managers exist, c) configure and cmake come close (but cmake is also an added
system requirement; and configure is a no-show on Windows) and d) even within
a given OS and release you may have very different versions. Lastly also: e)
some packages (RcppEigen is an example) have patches the system library would
not have applied (!!).

So to me a simplified view is that just as R "abstracts away" POSIX so that
we can always say e.g. 'dir.exists(path)' no matter where R runs, having a
package with headers ensure we get a consistent _and reliable_ compilation
experience from client packages. This matters.

Now, there are clearly downsides. With my Debian maintainer hat on, I have to
defend including Armadillo withon RcppArmadillo because the distro has it too
(but then version skew ie d) above and ease of use and consistency etc
dominate so we continue to ship RcppArmadillo).  At the same time, at CRAN we
have needless duplications. For example, my RcppCCTZ package was the first to
offer the nice (Google made but not a Google 'product') CCTZ library for R
use (starting in 2015). But when I last checked a year or so ago, four other
packages now included redundant extra copies. Also happens with Eigen. Not
great.

On the other side, packages with full (included or not) libraries work too,
but they are more effort to portably provide them, to explain to users where
to get them and keep them current and so.  It is hard (or even impossible)
for R to fill in as a _general system_ package manager across all OSs and
deployments.  There is a new kid on this block [3] we are starting to use at
work, and which may help in time across the platforms that R uses. To be
seen...

So to sum up: I think header packages are great, and I maintain a few, both
large and small in size.  I would encourage you to try them. For RcppEigen,
you can just use LinkingTo: to gets its headers.  Some 400+ packages rely on
it. (And its over 1000 for Armadillo now, and over 300 for BH.)

Hth,  Dirk

[1] https://arxiv.org/abs/1911.06416
[2] https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-libraries.pdf
[3] https://vcpkg.io/en/

-- 
dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org



More information about the R-package-devel mailing list