[R-pkg-devel] Ensuring permanence and SHA consistency of released CRAN packages for validated software

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Wed Mar 16 20:42:44 CET 2022


On 16/03/2022 2:51 p.m., Borini, Stefano wrote:
> Hello,
> 
> Validated software needs to ensure consistency and reproducibility of its environment, potentially in years' time, when the audit comes. For this reason, we identify all SHA of the packages we download from CRAN to ensure that the package has not changed after the fact, something that may signal us that the package has been corrupted, or malicious code has been added after the fact, and also guarantees the auditors that the packages are indeed the correct ones as they were at the time of release.
> 
> Currently I am dealing with a package that I downloaded once in the past, MASS_7.3-54. This package used to have SHA256
> 
>      b800ccd5b5c2709b1559cf5eab126e4935c4f8826cf7891253432bb6a056e821  MASS_7.3-54.tar.gz
> 
> The current package has instead SHA:
> 
>      eb644c0e94b447c46387aa22436ef5a43192960ee9cfd0df2940f4a4116179ae  MASS_7.3-54.tar.gz
> 
> This triggers all sort of alarms. It is established poor practice to replace a package after the fact exact for these reasons. Once a package is released, it should remain immutable. Subsequent builds can be introduced with a different build number.
> 
> The change appears to be due to the fact that CRAN rebuilds packages occasionally, for reasons to me unknown. Diffing the old and the new MASS_7.3.54.tar.gz reveals the change to be due to this:
> 
>      $ diff -Naur MASS_1/ MASS_2/
>      diff -Naur MASS_1/DESCRIPTION MASS_2/DESCRIPTION
>      --- MASS_1/DESCRIPTION  2021-05-03 10:03:00.000000000 +0100
>      +++ MASS_2/DESCRIPTION  2021-05-03 10:03:50.000000000 +0100
>      @@ -33,4 +33,4 @@
>         David Firth [ctb]
>       Maintainer: Brian Ripley <ripley using stats.ox.ac.uk>
>       Repository: CRAN
>      -Date/Publication: 2021-05-03 09:03:00 UTC
>      +Date/Publication: 2021-05-03 09:03:50 UTC
>      diff -Naur MASS_1/MD5 MASS_2/MD5
>      --- MASS_1/MD5  2021-05-03 10:03:00.000000000 +0100
>      +++ MASS_2/MD5  2021-05-03 10:03:50.000000000 +0100
>      @@ -1,4 +1,4 @@
>      -560f72bfd93ac57532d2cf113078d2e7 *DESCRIPTION
>      +ecf84f78aac3c625898be45513307d79 *DESCRIPTION
>       35aff05a505ecf7e81e0473767794ca9 *INDEX
>       c7acdc0fa828f781a0a5586ab9d4fa1b *LICENCE.note
>       0ac7b30ad35a4c19ea69d76a6a366b02 *NAMESPACE
> 
> Please prevent SHA changes of released packages on CRAN. Once a package is released, it should not be touched again.
> 
> --
> 
> Stefano Borini
> Principal Analytical Tools Developer
> AstraZeneca R&D BioPharmaceuticals | Data Science & AI | Early Biometrics & Statistical Innovation

I don't know the reason that MASS was built again 50 seconds after the 
first build, and it would be more convenient for you and some other 
people if it hadn't been, but your request comes across as unreasonably 
demanding.

You work for a company with a very large budget.  CRAN is run by 
volunteers, and as far as I know, your company has not contributed 
financially to running it.

If you want to guarantee that a CRAN package can be re-installed years 
from now, *you* should be archiving a copy of it.  You may be negligent 
by not doing so:  there's no guarantee that CRAN will still be 
distributing *any* version of MASS when the auditors show up.

Duncan Murdoch



More information about the R-package-devel mailing list