[Rd] [RFC] A case for freezing CRAN

Gavin Simpson ucfagls at gmail.com
Thu Mar 20 02:41:48 CET 2014


"What am I overlooking?"

That this is already available and possible in R today, but perhaps
not widely used. Developers do tend to only include a lower bound if
they include any bounds at all on package dependencies.

As I mentioned elsewhere, R packages often aren't "built" against
other R packages and often developers may have a range of versions
being tested against, some of which may not be on CRAN yet.

Technically, all packages on CRAN would need to have a dependency cap
on R-devel, but as that is a moving target until it is released I
don't see in practice how enforcing an upper limit on the R dependency
would  work. The way CRAN works, you can't just set a dependency on R
== 3.0.x say. (As far as I understand CRAN's policies.)

For packages it is quite trivial for the developers to manually add
the required info for the upperbound, less so the lower bound, but you
could just pick a known working version. An upper range on the
dependencies could be stated as whatever version is current on CRAN.
But then what happens? Unbeknownst to you, a few days after you
release to CRAN your package foo with stated dependency on bar >= 1.2,
bar <= 1.8, the developer of bar releases bar v 2.0 and your package
no longer passes checks, CRAN gets in touch and you have to resubmit
another version. This could be desirable in terms of helping
contribute to reproducibility exercises, but incurs more effort on the
CRAN maintainers and package maintainers. Now, this might be an issue
because of the desire on CRAN's behalf to have some elements of human
intervention in the submission process, but you either work with CRAN
or do your own thing.

As Bioconductor have shown (for example) it is possible, if people
want to put in time and effort and have a community buy into an ethos,
to achieve staged releases etc.

G

On 19 March 2014 12:58, Carl Boettiger <cboettig at gmail.com> wrote:
> Dear list,
>
> I'm curious what people would think of a more modest proposal at this time:
>
> State the version of the dependencies used by the package authors when the
> package was built.
>
> Eventually CRAN could enforce such a statement be present in the
> description. We encourage users to declare the version of the packages they
> use in publications, so why not have the same expectation of developers?
>  This would help address the problem of archived packages that Jeroen
> raises, as it is currently it is impossible to reliably install archived
> packages because their dependencies have since been updated and are no
> longer compatible.  (Even if it passes checks and installs, we have no way
> of knowing if the upstream changes have introduced a bug).  This
> information would be relatively straight forward to capture, shouldn't
> change the way anyone currently uses CRAN, and should address a major pain
> point anyone trying to install archived versions from CRAN has probably
> encountered.  What am I overlooking?
>
> Carl
>
>
> On Wed, Mar 19, 2014 at 11:36 AM, Spencer Graves <
> spencer.graves at structuremonitoring.com> wrote:
>
>>       What about having this purpose met with something like an expansion
>> of R-Forge?  We could have packages submitted to R-Forge rather than CRAN,
>> and people who wanted the latest could get it from R-Forge.  If changes I
>> make on R-Forge break a reverse dependency, emails explaining the problem
>> are sent to both me and the maintainer for the package I broke.
>>
>>
>>       The budget for R-Forge would almost certainly need to be increased:
>>  They currently disable many of the tests they once ran.
>>
>>
>>       Regarding budget, the R Project would get more donations if they
>> asked for them and made it easier to contribute.  I've tried multiple times
>> without success to find a way to donate.  I didn't try hard, but it
>> shouldn't be hard ;-)  (And donations should be accepted in US dollars and
>> Euros -- and maybe other currencies.) There should be a procedure whereby
>> anyone could receive a pro forma invoice, which they can pay or ignore as
>> they choose.  I mention this, because many grants could cover a reasonable
>> fee provided they have an invoice.
>>
>>
>>       Spencer Graves
>>
>>
>> On 3/19/2014 10:59 AM, Jeroen Ooms wrote:
>>
>>> On Wed, Mar 19, 2014 at 5:52 AM, Duncan Murdoch <murdoch.duncan at gmail.com
>>> >wrote:
>>>
>>>  I don't see why CRAN needs to be involved in this effort at all.  A third
>>>> party could take snapshots of CRAN at R release dates, and make those
>>>> available to package users in a separate repository.  It is not hard to
>>>> set
>>>> a different repository than CRAN as the default location from which to
>>>> obtain packages.
>>>>
>>>>  I am happy to see many people giving this some thought and engage in the
>>> discussion.
>>>
>>> Several have suggested that staging & freezing can be simply done by a
>>> third party. This solution and its limitations is also described in the
>>> paper [1] in the section titled "R: downstream staging and repackaging".
>>>
>>> If this would solve the problem without affecting CRAN, we would have been
>>> done this obviously. In fact, as described in the paper and pointed out by
>>> some people, initiatives such as Debian or Revolution Enterprise already
>>> include a frozen library of R packages. Also companies like Google
>>> maintain
>>> their own internal repository with packages that are used throughout the
>>> company.
>>>
>>> The problem with this approach is that when you using some 3rd party
>>> package snapshot, your r/sweave scripts will still only be
>>> reliable/reproducible for other users of that specific snapshot. E.g. for
>>> the examples above, a script that is written in R 3.0 by a Debian user is
>>> not guaranteed to work on R 3.0 in Google, or R 3.0 on some other 3rd
>>> party
>>> cran snapshot. Hence this solution merely redefines the problem from "this
>>> script depends on pkgA 1.1 and pkgB 0.2.3" to "this script depends on
>>> repository foo 2.0". And given that most users would still be pulling
>>> packages straight from CRAN, it would still be terribly difficult to
>>> reproduce a 5 year old sweave script from e.g. JSS.
>>>
>>> For this reason I believe the only effective place to organize this
>>> staging
>>> is all the way upstream, on CRAN. Imagine a world where your r/sweave
>>> script would be reliable/reproducible, out of the box, on any system, any
>>> platform in any company using on R 3.0. No need to investigate which
>>> specific packages or cran snapshot the author was using at the time of
>>> writing the script, and trying to reconstruct such libraries for each
>>> script you want to reproduce. No ambiguity about which package versions
>>> are
>>> used by R 3.0. However for better or worse, I think this could only be
>>> accomplished with a cran release cycle (i.e. "universal snapshots")
>>> accompanying the already existing r releases.
>>>
>>>
>>>
>>>  The only objection I can see to this is that it requires extra work by
>>>> the
>>>> third party, rather than extra work by the CRAN team. I don't think the
>>>> total amount of work required is much different.  I'm very unsympathetic
>>>> to
>>>> proposals to dump work on others.
>>>>
>>>
>>> I am merely trying to discuss a technical issue in an attempt to improve
>>> reliability of our software and reproducibility of papers created with R.
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Carl Boettiger
> UC Santa Cruz
> http://carlboettiger.info/
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



-- 
Gavin Simpson, PhD



More information about the R-devel mailing list