[R-pkg-devel] Bioconductor reverse dependency checks for a CRAN package

Dirk Eddelbuettel edd @end|ng |rom deb|@n@org
Tue Jan 30 17:32:36 CET 2024


Ivan,

On 30 January 2024 at 18:56, Ivan Krylov via R-package-devel wrote:
| Hello R-package-devel,
| 
| What would you recommend in order to run reverse dependency checks for
| a package with 182 direct strong dependencies from CRAN and 66 from
| Bioconductor (plus 3 more from annotations and experiments)?
| 
| Without extra environment variables, R CMD check requires the Suggested
| packages to be available, which means installing...
| 
| revdepdep <- package_dependencies(revdep, which = 'most')
| revdeprest <- package_dependencies(
|  unique(unlist(revdepdep)),
|  which = 'strong', recursive = TRUE
| )
| length(setdiff(
|  unlist(c(revdepdep, revdeprest)),
|  unlist(standard_package_names())
| ))
| 
| ...up to 1316 packages. 7 of these suggested packages aren't on CRAN or
| Bioconductor (because they've been archived or have always lived on
| GitHub), but even if I filter those out, it's not easy. Some of the
| Bioconductor dependencies are large; I now have multiple gigabytes of
| genome fragments and mass spectra, but also a 500-megabyte arrow.so in
| my library. As long as a data package declares a dependency on your
| package, it still has to be installed and checked, right?
| 
| Manually installing the SystemRequirements is no fun at all, so I've
| tried the rocker/r2u container. It got me most of the way there, but
| there were a few remaining packages with newer versions on CRAN. For

If that happens, please file an issue ticket at the r2u site.  CRAN should be
current as I update business daily whenever p3m does and hence will be as
current as approaches using it (and encode the genuine system dependencies).

BioConductor in r2u is both more manual (and I try to update "every few
days") and course not complete so if you miss a package _from BioConductor_
again please just file an issue ticket.

| these, I had to install the system packages manually in order to build
| them from source.

For what it is worth, my own go-to for many years has been a VM in which I
install 'all packages needed' for the rev.dep to be checked. Doing it with
on-demands 'lambda function (one per package tested)' based on r2u would be a
nice alternative but I don't have the aws credits to try it...

| Someone told me to try the rocker/r-base container together with pak.
| It was more proactive at telling me about dependency conflicts and
| would have got me most of the way there too, except it somehow got me a
| 'stringi' binary without the corresponding libicu*.so*, which stopped
| the installation process. Again, nothing that a bit of manual work
| wouldn't fix, but I don't feel comfortable setting this up on a CI
| system. (Not on every commit, of course - that would be extremely
| wasteful - but it would be nice if it was possible to run these checks
| before release on a different computer and spot more problems this way.)
| 
| I can't help but notice that neither install.packages() nor pak() is
| the recommended way to install Bioconductor packages. Could that
| introduce additional problems with checking the reverse dependencies?

As Martin already told you, BioConductor has always had their own
installation wrapper because they are a 'little different' with the bi-annual
release cycle.
 
| Then there's the check_packages_in_dir() function itself. Its behaviour
| about the reverse dependencies is not very helpful: they are removed
| altogether or at least moved away. Something may be wrong with my CRAN
| mirror, because some of the downloaded reverse dependencies come out
| with a size of zero and subsequently fail the check very quickly.
| 
| I am thinking of keeping a separate persistent library with all the
| 1316 dependencies required to check the reverse dependencies and a

As stated above, that is what works for me. I used to use a chroot directory
on a 'big server', now I use a small somewhat underpowered VM.

| persistent directory with the reverse dependencies themselves. Instead
| of using the reverse=... argument, I'm thinking of using the following
| scheme:
| 
| 1. Use package_dependencies() to determine the list of packages to test.
| 2. Use download.packages() to download the latest version of everything
| if it doesn't already exist. Retry if got zero-sized or otherwise
| damaged tarballs. Remove old versions of packages if a newer version
| exists.
| 3. Run check_packages_in_dir() on the whole directory with the
| downloaded reverse dependencies.
| 
| For this to work, I need a way to run step (3) twice, ensuring that one
| of the runs is performed with the CRAN version of the package in the
| library and the other one is performed with the to-be-released version
| of the package in the library. Has anyone already come up with an
| automated way to do that?
| 
| No wonder nobody wants to maintain the XML package.

Well a few of us maintain packages with quite a tail and cope. Rcpp has 2700,
RcppArmadillo have over 100, BH a few hundred. These aren't 'light'. I wrote
myself the `prrd` package (on CRAN) for this, others have other tools -- Team
data.table managed to release 1.5.0 to CRAN today too. So this clearly is
possible. But it is a worthy discussion topic so thanks for raising it.

Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org



More information about the R-package-devel mailing list