[R-pkg-devel] New CRAN internet policy

Fri Dec 21 17:35:17 CET 2018

On Fri, Dec 7, 2018 at 2:48 AM Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> >>>>> Hadley Wickham
> >>>>>     on Thu, 6 Dec 2018 10:22:47 -0600 writes:
>
>     > Hi all,
>     > I'd love to get some clarification on what the new internet policy
>     > means for packages like httr:
>
>     >> Packages which use Internet resources should fail gracefully with an informative
>     >> message if the resource is not available (and not give a check warning nor error).
>
>     > It's not clear what "internet resource" means here? If it means
>     > dataset, then I think the httr tests and examples are ok. If it means
>     > any use of the internet, I'm not sure what do - httr critically
>     > depends on internet access, so I can't see any way to make it fail
>     > gracefully.
>
>     > Hadley
>
> I cannot answer your question, notably as I'm not part of the CRAN
> team, but as R Core developer, I've encountered the problem
> many times which this policy tries to mitigate
> (but I also think we should consider to go further than the
>  above "policy") :
>
> As R developer, I'd like to see the effect of a change to the
> sources of base R, and so eventually, I may want to run the
> equivalent of 'R CMD check' on all existing CRAN and
> Bioconductor packages. If I have access to a server with many
> cores and very fast hard disks, I can hope to finish running the
> tests with 1--2 days.
> But then I have to deal with the result.  The few times I've
> done this, the result has been "a mess" because many many
> packages  nowadays assume in their examples and their regression/unit
> tests that internet access to some resources works, ... which it
> "often" does not, and so  download.file(),
> read.table("http://.....") etc result in errors sooner or later.
>
> Because of that some packages fail their checks "randomly" (in
> the sense that internet resources are not available "randomly").
> Ideally we'd find a very good way that these failures are
> communicated back to the person / process running (a version of)
> 'R CMD check', because in the above scenario, I'd like to weed
> out the 300 packages that just failed because of internet
> resource access failures,  and only look at the other packages
> that got a change in their 'R CMD check' results.

We have now decent tooling for this in revdepcheck
(https://www.github.com/r-lib/revdepcheck, planning for CRAN
submission next year). After performing all the revdepchecks, you can
run revdep_add_broken() to recheck packages that failed in the
previous round - in my experience testing httr (whose revdeps
obviously use the internet a lot) this resolves most of the randomness
(since it's fairly unlikely to get two random failures in a row).

My main concern about making the checking in examples and tests
stricter is that I think the primary result is that people will simply
do less testing and write fewer realistic examples, which is a net
negative for the community. When you want people to do the right
thing, I think you have to provide a carrot along with the stick.

> The recent introduction in R-devel of classed error conditions
> (in some cases), e.g., https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2018/10/04#n2018-10-04,  and the similar and somewhat earlier
> effort of Lionel Henry to use classed error conditions (in
> rlang only, unfortunately, rather than as a patch proposal to R ..)
> maybe one step towards a nice solution here.

We'd be happy to propose a patch to base R, but it's not yet clear to
us exactly how things should work so I think it makes the most sense
to first prototype in a package.

Hadley

-- 
http://hadley.nz