[R-pkg-devel] New CRAN internet policy
Hadley Wickham
h@wickh@m @ending from gm@il@com
Fri Dec 21 17:35:17 CET 2018
On Fri, Dec 7, 2018 at 2:48 AM Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> >>>>> Hadley Wickham
> >>>>> on Thu, 6 Dec 2018 10:22:47 -0600 writes:
>
> > Hi all,
> > I'd love to get some clarification on what the new internet policy
> > means for packages like httr:
>
> >> Packages which use Internet resources should fail gracefully with an informative
> >> message if the resource is not available (and not give a check warning nor error).
>
> > It's not clear what "internet resource" means here? If it means
> > dataset, then I think the httr tests and examples are ok. If it means
> > any use of the internet, I'm not sure what do - httr critically
> > depends on internet access, so I can't see any way to make it fail
> > gracefully.
>
> > Hadley
>
> I cannot answer your question, notably as I'm not part of the CRAN
> team, but as R Core developer, I've encountered the problem
> many times which this policy tries to mitigate
> (but I also think we should consider to go further than the
> above "policy") :
>
> As R developer, I'd like to see the effect of a change to the
> sources of base R, and so eventually, I may want to run the
> equivalent of 'R CMD check' on all existing CRAN and
> Bioconductor packages. If I have access to a server with many
> cores and very fast hard disks, I can hope to finish running the
> tests with 1--2 days.
> But then I have to deal with the result. The few times I've
> done this, the result has been "a mess" because many many
> packages nowadays assume in their examples and their regression/unit
> tests that internet access to some resources works, ... which it
> "often" does not, and so download.file(),
> read.table("http://.....") etc result in errors sooner or later.
>
> Because of that some packages fail their checks "randomly" (in
> the sense that internet resources are not available "randomly").
> Ideally we'd find a very good way that these failures are
> communicated back to the person / process running (a version of)
> 'R CMD check', because in the above scenario, I'd like to weed
> out the 300 packages that just failed because of internet
> resource access failures, and only look at the other packages
> that got a change in their 'R CMD check' results.
We have now decent tooling for this in revdepcheck
(https://www.github.com/r-lib/revdepcheck, planning for CRAN
submission next year). After performing all the revdepchecks, you can
run revdep_add_broken() to recheck packages that failed in the
previous round - in my experience testing httr (whose revdeps
obviously use the internet a lot) this resolves most of the randomness
(since it's fairly unlikely to get two random failures in a row).
My main concern about making the checking in examples and tests
stricter is that I think the primary result is that people will simply
do less testing and write fewer realistic examples, which is a net
negative for the community. When you want people to do the right
thing, I think you have to provide a carrot along with the stick.
> The recent introduction in R-devel of classed error conditions
> (in some cases), e.g., https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2018/10/04#n2018-10-04, and the similar and somewhat earlier
> effort of Lionel Henry to use classed error conditions (in
> rlang only, unfortunately, rather than as a patch proposal to R ..)
> maybe one step towards a nice solution here.
We'd be happy to propose a patch to base R, but it's not yet clear to
us exactly how things should work so I think it makes the most sense
to first prototype in a package.
Hadley
--
http://hadley.nz
More information about the R-package-devel
mailing list