[R-pkg-devel] New CRAN internet policy

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Fri Dec 7 09:48:37 CET 2018

>>>>> Hadley Wickham 
>>>>>     on Thu, 6 Dec 2018 10:22:47 -0600 writes:

    > Hi all,
    > I'd love to get some clarification on what the new internet policy
    > means for packages like httr:

    >> Packages which use Internet resources should fail gracefully with an informative
    >> message if the resource is not available (and not give a check warning nor error).

    > It's not clear what "internet resource" means here? If it means
    > dataset, then I think the httr tests and examples are ok. If it means
    > any use of the internet, I'm not sure what do - httr critically
    > depends on internet access, so I can't see any way to make it fail
    > gracefully.

    > Hadley

I cannot answer your question, notably as I'm not part of the CRAN
team, but as R Core developer, I've encountered the problem
many times which this policy tries to mitigate
(but I also think we should consider to go further than the
 above "policy") :

As R developer, I'd like to see the effect of a change to the
sources of base R, and so eventually, I may want to run the
equivalent of 'R CMD check' on all existing CRAN and
Bioconductor packages. If I have access to a server with many
cores and very fast hard disks, I can hope to finish running the
tests with 1--2 days.
But then I have to deal with the result.  The few times I've
done this, the result has been "a mess" because many many
packages  nowadays assume in their examples and their regression/unit
tests that internet access to some resources works, ... which it
"often" does not, and so  download.file(),
read.table("http://.....") etc result in errors sooner or later.

Because of that some packages fail their checks "randomly" (in
the sense that internet resources are not available "randomly").
Ideally we'd find a very good way that these failures are
communicated back to the person / process running (a version of)
'R CMD check', because in the above scenario, I'd like to weed
out the 300 packages that just failed because of internet
resource access failures,  and only look at the other packages
that got a change in their 'R CMD check' results.

The recent introduction in R-devel of classed error conditions
(in some cases), e.g., https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2018/10/04#n2018-10-04,  and the similar and somewhat earlier
effort of Lionel Henry to use classed error conditions (in
rlang only, unfortunately, rather than as a patch proposal to R ..)
maybe one step towards a nice solution here.

Ideally we'd not just get  'ERROR'  in the check 'Status:' line,
but qualified errors, and if for a package the only errors are caused by
internet resource access failures, I could easily filter them out.


