[Rd] [RFC] A case for freezing CRAN

Wed Mar 19 19:50:58 CET 2014

On Wed, Mar 19, 2014 at 12:59 PM, Jeroen Ooms <jeroen.ooms at stat.ucla.edu> wrote:
> On Wed, Mar 19, 2014 at 5:52 AM, Duncan Murdoch <murdoch.duncan at gmail.com>wrote:
>
>> I don't see why CRAN needs to be involved in this effort at all.  A third
>> party could take snapshots of CRAN at R release dates, and make those
>> available to package users in a separate repository.  It is not hard to set
>> a different repository than CRAN as the default location from which to
>> obtain packages.
>>
>
> I am happy to see many people giving this some thought and engage in the
> discussion.
>
> Several have suggested that staging & freezing can be simply done by a
> third party. This solution and its limitations is also described in the
> paper [1] in the section titled "R: downstream staging and repackaging".
>
> If this would solve the problem without affecting CRAN, we would have been
> done this obviously. In fact, as described in the paper and pointed out by
> some people, initiatives such as Debian or Revolution Enterprise already
> include a frozen library of R packages. Also companies like Google maintain
> their own internal repository with packages that are used throughout the
> company.
>
The suggested solution is not described in the referenced article.  It
was not suggested that it be the operating system's responsibility to
distribute snapshots, nor was it suggested to create binary
repositories for specific operating systems, nor was it suggested to
freeze only a subset of CRAN packages.

> The problem with this approach is that when you using some 3rd party
> package snapshot, your r/sweave scripts will still only be
> reliable/reproducible for other users of that specific snapshot. E.g. for
> the examples above, a script that is written in R 3.0 by a Debian user is
> not guaranteed to work on R 3.0 in Google, or R 3.0 on some other 3rd party
> cran snapshot. Hence this solution merely redefines the problem from "this
> script depends on pkgA 1.1 and pkgB 0.2.3" to "this script depends on
> repository foo 2.0". And given that most users would still be pulling
> packages straight from CRAN, it would still be terribly difficult to
> reproduce a 5 year old sweave script from e.g. JSS.
>
This can be solved by the third party making the repository public.

> For this reason I believe the only effective place to organize this staging
> is all the way upstream, on CRAN. Imagine a world where your r/sweave
> script would be reliable/reproducible, out of the box, on any system, any
> platform in any company using on R 3.0. No need to investigate which
> specific packages or cran snapshot the author was using at the time of
> writing the script, and trying to reconstruct such libraries for each
> script you want to reproduce. No ambiguity about which package versions are
> used by R 3.0. However for better or worse, I think this could only be
> accomplished with a cran release cycle (i.e. "universal snapshots")
> accompanying the already existing r releases.
>
This could be done by a public third-party repository, independent of
CRAN.  However, you would need to find a way to actively _prevent_
people from installing newer versions of packages with the stable R
releases.

--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com