[Bioc-devel] RFC: Bioc repository for single-version packages

Martin Morgan mtmorg@n@bioc @ending from gm@il@com
Mon Nov 5 23:17:40 CET 2018


This is a continuation of the discussion at

https://support.bioconductor.org/p/114814/#114824

Where Wolfgang asks about "creating a corner in the Bioconductor package ecosystem for packages that are only ever supposed to build and check with a single release"

I think this would be quite challenging to implement correctly, for instance ensuring that the user of an appropriate version of R can easily install the intended dependencies, and what exactly it means for a package to be restricted to a single release, e.g., CRAN packages are updated without versioned releases [I mean, a user of Bioc 3.7 will get the current version of the CRAN package, not the version that was available at the (beginning or end) of the 3.7 release], so presumably the idea is that there is a snapshot of package versions that one requires. This part sounds as much like a job for packrat / switchr etc. Maybe 'our' job is to ensure that the appropriate information is discoverable?

I took as an example the defunct package BioMedR. Our friend google ("Bioconductor BioMedR") took me to the last-known-good landing page (initially by way of a mirror in Japan...). The DOI on the (bioconductor.org version) of that page took me to the 'Removed packages' ( https://bioconductor.org/about/removed-packages/ ) page, which again points to the last-known-good page. Likewise https://bioconductor.org/packages/BioMedR . The 'In bioc since' tag on the 'last-known-good' page allowed me to find the version of Bioconductor where the package was introduced. With some work I can find the AMI (https://bioconductor.org/help/bioconductor-cloud-ami/ ) and docker images (https://hub.docker.com/r/bioconductor/release_base2/tags/ ) for that release of Bioconductor; neither of these would be sufficient for reproducibility (I could get relevant Bioconductor package versions simply installing the package from our archive via BiocInstaller / BiocManager, but R packages would be more challenging). The package has a (impressively extensive!) vignette, but the vignette does not include sessionInfo() so one has to do considerable extra work to find the relevant packages. Again maybe packrat / switchr help with this...

I think 'incoming' versions of such packages would go through the usual review process, in an attempt to hue to some sort of overall Bioconductor standard of quality; the return on this investment would be limited by the short intended shelf-life of the package. These packages often have unique considerations, too, e.g., 'large' data and long build times, maintainer concerns about when the package is released relative to publication, etc. Also of interest would be commitment to the actual data storage and transfer costs and to the management costs of this type of package, coupled with appropriate consideration on scope of the repository (not just the Bioconductor cognoscenti, presumably) and advertising of availability e.g., via https://www.nature.com/sdata/policies/repositories .

Contemplating this type of package repository suggests a number of small items that provide 'cosmetic' improvements to the current situation (e.g., the removed-packages page could be organized in a tabular fashion to include from / to versions); a more meaningful attempt would probably require efforts to embrace packrat / switchr to avoid reinventing the reproducibility wheel, as well as commitment to reviewing and managing these packages for their long-term contribution. These are certainly noble goals and align with Bioconductor's emphasis on reproducibility; is this something that rises to the level of securing separate funding?

Martin


More information about the Bioc-devel mailing list