[R-pkg-devel] Ensuring permanence and SHA consistency of released CRAN packages for validated software
Borini, Stefano
@te|@no@bor|n| @end|ng |rom @@tr@zenec@@com
Thu Mar 17 11:48:11 CET 2022
Related to this, there's also been discussion (here or on R-devel), of
having `R CMD build` produce identical tarballs when the input doesn't
change, but the injection of `Packaged: <timestamp>; <user>` to the
`DESCRIPTION` file prevents this. If I recall correctly, there was at
least some discussion on being able to control, or anonymize, the
<user> part.
Yes, you can’t have timestamps in metadata if you want a reproducible build.
Well, you can, provided that you use different strategies. For example, have a superformat (e.g. like python wheel) instead of a plain tar.gz, and have metainfo
Inside that. Or you can provide a gpg signature. Even if the sha were to change, one can check for integrity against the signature and know it hasn’t been messed with.
MRAN (https://mran.microsoft.com/timemachine<https://mran.microsoft.com/timemachine>) provides a daily
snapshot of CRAN, and it goes back several years, but I'm not sure if
that would solve your problem. It's only stable for a particular date,
but I'd guess that in this case it could pick up one build one day,
and the other one the next day.
I don’t believe in the snapshot model. It doesn’t scale, and the reality of development, especially agile development, is that I have to mix and match depending on what users require me to add.
The library I need may not be present in the snapshot, or present in a version that is too old, or constraints may not be satisfied (I hardly believe one can ensure a consistent dependency tree with full constraints respected on more than 9000 packages and counting).
There are a few working groups over at the R Consortium
(https://www.r-consortium.org/projects/isc-working-groups<https://www.r-consortium.org/projects/isc-working-groups>) who are
interested in reproducibility of R packages. I suspect the 'R
Validation Hub' working group (https://www.pharmar.org/overview/<https://www.pharmar.org/overview>)
would be interested in these type of hiccups, even if it's just to
collect rare "incidents" like this one. I suggest you ping them as
well.
Will do. Thanks.
As I said, I have some availability of my time for opensource projects and external collaborations, and I can help somehow, but it really depends on what one can do. I know from experience that package management and dependencies are a really hard battle.
/Henrik
On Wed, Mar 16, 2022 at 12:45 PM Duncan Murdoch
<murdoch.duncan using gmail.com> wrote:
>
> On 16/03/2022 2:51 p.m., Borini, Stefano wrote:
> > Hello,
> >
> > Validated software needs to ensure consistency and reproducibility of its environment, potentially in years' time, when the audit comes. For this reason, we identify all SHA of the packages we download from CRAN to ensure that the package has not changed after the fact, something that may signal us that the package has been corrupted, or malicious code has been added after the fact, and also guarantees the auditors that the packages are indeed the correct ones as they were at the time of release.
> >
> > Currently I am dealing with a package that I downloaded once in the past, MASS_7.3-54. This package used to have SHA256
> >
> > b800ccd5b5c2709b1559cf5eab126e4935c4f8826cf7891253432bb6a056e821 MASS_7.3-54.tar.gz
> >
> > The current package has instead SHA:
> >
> > eb644c0e94b447c46387aa22436ef5a43192960ee9cfd0df2940f4a4116179ae MASS_7.3-54.tar.gz
> >
> > This triggers all sort of alarms. It is established poor practice to replace a package after the fact exact for these reasons. Once a package is released, it should remain immutable. Subsequent builds can be introduced with a different build number.
> >
> > The change appears to be due to the fact that CRAN rebuilds packages occasionally, for reasons to me unknown. Diffing the old and the new MASS_7.3.54.tar.gz reveals the change to be due to this:
> >
> > $ diff -Naur MASS_1/ MASS_2/
> > diff -Naur MASS_1/DESCRIPTION MASS_2/DESCRIPTION
> > --- MASS_1/DESCRIPTION 2021-05-03 10:03:00.000000000 +0100
> > +++ MASS_2/DESCRIPTION 2021-05-03 10:03:50.000000000 +0100
> > @@ -33,4 +33,4 @@
> > David Firth [ctb]
> > Maintainer: Brian Ripley <ripley using stats.ox.ac.uk>
> > Repository: CRAN
> > -Date/Publication: 2021-05-03 09:03:00 UTC
> > +Date/Publication: 2021-05-03 09:03:50 UTC
> > diff -Naur MASS_1/MD5 MASS_2/MD5
> > --- MASS_1/MD5 2021-05-03 10:03:00.000000000 +0100
> > +++ MASS_2/MD5 2021-05-03 10:03:50.000000000 +0100
> > @@ -1,4 +1,4 @@
> > -560f72bfd93ac57532d2cf113078d2e7 *DESCRIPTION
> > +ecf84f78aac3c625898be45513307d79 *DESCRIPTION
> > 35aff05a505ecf7e81e0473767794ca9 *INDEX
> > c7acdc0fa828f781a0a5586ab9d4fa1b *LICENCE.note
> > 0ac7b30ad35a4c19ea69d76a6a366b02 *NAMESPACE
> >
> > Please prevent SHA changes of released packages on CRAN. Once a package is released, it should not be touched again.
> >
> > --
> >
> > Stefano Borini
> > Principal Analytical Tools Developer
> > AstraZeneca R&D BioPharmaceuticals | Data Science & AI | Early Biometrics & Statistical Innovation
>
> I don't know the reason that MASS was built again 50 seconds after the
> first build, and it would be more convenient for you and some other
> people if it hadn't been, but your request comes across as unreasonably
> demanding.
>
> You work for a company with a very large budget. CRAN is run by
> volunteers, and as far as I know, your company has not contributed
> financially to running it.
>
> If you want to guarantee that a CRAN package can be re-installed years
> from now, *you* should be archiving a copy of it. You may be negligent
> by not doing so: there's no guarantee that CRAN will still be
> distributing *any* version of MASS when the auditors show up.
>
> Duncan Murdoch
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel<https://stat.ethz.ch/mailman/listinfo/r-package-devel>
________________________________
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com<https://www.astrazeneca.com>
[[alternative HTML version deleted]]
More information about the R-package-devel
mailing list