[R-pkg-devel] CRAN rules re. web scraping?

Adam H Sparks @d@mh@p@rk@ @end|ng |rom gm@||@com
Thu Jan 23 04:18:50 CET 2020


Hi Spencer,
To add to what Roy has already provided. If you have tests that require
Internet access, you should be using skip_on_cran() for those tests and in
your examples using the \donttest{} tags to prevent errors on CRAN servers
when Internet is not available or the server is not responding or the
resource is unavailable.

Using tryCatch() will be helpful for the end-user experience, but will not
completely fix the issue that is being raised here.


On Thu, 23 Jan 2020 at 11:59, Roy Mendelssohn - NOAA Federal via
R-package-devel <r-package-devel using r-project.org> wrote:

> Hi Spencer:
>
> I think that message means what it says,  and I read it as pretty
> straightforward and business like.  The issue is not web scraping.  There
> are two errors here:
>
> 1.  You can not write to the user's space without first explicitly asking
> permission of the user.   The suggested policy is to write to a temp
> directory,  R has tempdir() and related commands for how to do this.
>
> 2.  When accessing something over the internet,  failure of the access
> must be checked for and the program exiting gracefully.  The second error
> appears to be that at times on the builds the .csv file is not downloaded,
> but there is no check,  just an error is thrown.  There are a number of
> ways to catch such errors,  such as "try...catch"  which will solve this
> problem
>
> HTH,
>
> -Roy
>
>
> > On Jan 22, 2020, at 5:48 PM, Spencer Graves <
> spencer.graves using effectivedefense.org> wrote:
> >
> > Hello, All:
> >
> >
> > GOOD NEWS AND BAD NEWS:
> >
> >
> >       * First the good news:  I heard from Brian Ripley;  see below.
> > His web site says, "He retired in August 2014 on grounds of ill health."
> > (http://www.stats.ox.ac.uk/~ripley/)  I was pleased to see that he
> seems
> > to be well enough to send me the email below.
> >
> >
> >       * BAD NEWS:  My Ecfun package is violating current CRAN rules
> > regarding "not writing anywhere in the file space".  (See below.)
> >
> >
> > QUESTION:
> >
> >
> >       How do you suggest I respond to this?
> >
> >
> >       It's hard for me to fix, because I cannot replicate the error and
> > I don't understand the rules Prof. Ripley is trying to enforce. The
> > "CRAN Package Check Results for" this package show an error on 1
> > platform (r-devel-linux-x86_64-fedora-gcc), NOTEs on 3 platforms
> > (Fedora-clang and Debian), and "OK" on 9 others.  I can program selected
> > tests not to run on CRAN, e.g., with (!fda::CRAN()).
> >
> >
> >       However, I suspect I should be able to do better than that.
> >
> >
> >       Suggestions?
> >
> >
> >       Thanks,
> >       Spencer Graves
> >
> >
> > p.s.  The development version of this package is available at
> > "https://github.com/sbgraves237/Ecfun".
> >
> >
> > https://cloud.r-project.org/web/checks/check_results_Ecfun.html
> >
> >
> > -------- Forwarded Message --------
> > Subject:      CRAN package Ecfun
> > Date:         Tue, 21 Jan 2020 21:26:02 +0000
> > From:         Prof Brian Ripley <ripley using stats.ox.ac.uk>
> > Reply-To:     CRAN <CRAN using r-project.org>
> > To:   Spencer Graves <spencer.graves using effectivedefense.org>
> > CC:   CRAN <CRAN using r-project.org>
> >
> >
> >
> > This has been intermittently failing its checks for a week: different
> > check runs failed (in the 24h prior to) the 14th, 15th, 17th and today.
> > The current failure is
> >
> > Check: examples
> > Result: ERROR
> > Running examples in ‘Ecfun-Ex.R’ failed
> > The error most likely occurred in:
> >
> >> ### Name: read.testURLs
> >> ### Title: Read a file produced by testURLs
> >> ### Aliases: read.testURLs
> >> ### Keywords: IO
> >>
> >> ### ** Examples
> >>
> >> # Test only 2 web sites, not the default 4,
> >> # and test only twice, not the default 10 times:
> >> tst <- testURLs(c(
> > + PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index",
> > + house="http://house.gov/representatives"),
> > + n=2, maxFail=2)
> > 1
> > 1579634784, PVI, TRUE 0.828
> > 1579634785, house, FALSE 0.051
> > 1579634785, house, FALSE 0.048
> > 2
> > 1579634785, PVI, TRUE 0.043
> > 1579634785, house, FALSE 0.11
> > 1579634785, house, FALSE 0.035
> >>
> >> # The above should have created a file 'testURLresults.csv'
> >> # in the working directory. Read it.
> >>
> >> dat <- read.testURLs()
> > Error in read.table(file = file, header = header, sep = sep, quote =
> > quote, :
> > more columns than column names
> > Calls: read.testURLs -> read.csv -> read.table
> >
> > That does not conform to the policy on Internet access, not least as no
> > attempt is made to check if the file was created, let alone that it has
> > the expected layout. Nor does it conform to the policy on not writing
> > anywhere in the file space (and that shows on its CRAN results page too).
> >
> > Please correct ASAP and before Feb 4 to safely retain the package on
> CRAN.
> >
> > --
> > Brian D. Ripley,                  ripley using stats.ox.ac.uk
> > Emeritus Professor of Applied Statistics, University of Oxford
> >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-package-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> **********************
> "The contents of this message do not reflect any position of the U.S.
> Government or NOAA."
> **********************
> Roy Mendelssohn
> Supervisory Operations Research Analyst
> NOAA/NMFS
> Environmental Research Division
> Southwest Fisheries Science Center
> ***Note new street address***
> 110 McAllister Way
> Santa Cruz, CA 95060
> Phone: (831)-420-3666
> Fax: (831) 420-3980
> e-mail: Roy.Mendelssohn using noaa.gov www: https://www.pfeg.noaa.gov/
>
> "Old age and treachery will overcome youth and skill."
> "From those who have been given much, much will be expected"
> "the arc of the moral universe is long, but it bends toward justice" -MLK
> Jr.
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>


-- 
Dr Adam H. Sparks
http://adamhsparks.netlify.com/
Associate Professor of Field Crops Pathology   |   Centre for Crop Health
|  Office C313

Phone (+61) 07 46311948  |  Mobile 0415 489 422 | Twitter @adamhsparks
<https://twitter.com/adamhsparks>

Institute for Life Sciences and the Environment |  Research and Innovation
Division
University of Southern Queensland | Toowoomba, Queensland | 4350 | Australia

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list