[R-pkg-devel] CRAN rules re. web scraping?

Roy Mendelssohn - NOAA Federal roy@mende|@@ohn @end|ng |rom no@@@gov
Thu Jan 23 02:57:26 CET 2020


Hi Spencer:

I think that message means what it says,  and I read it as pretty straightforward and business like.  The issue is not web scraping.  There are two errors here:

1.  You can not write to the user's space without first explicitly asking permission of the user.   The suggested policy is to write to a temp directory,  R has tempdir() and related commands for how to do this.

2.  When accessing something over the internet,  failure of the access must be checked for and the program exiting gracefully.  The second error appears to be that at times on the builds the .csv file is not downloaded,  but there is no check,  just an error is thrown.  There are a number of ways to catch such errors,  such as "try...catch"  which will solve this problem

HTH,

-Roy


> On Jan 22, 2020, at 5:48 PM, Spencer Graves <spencer.graves using effectivedefense.org> wrote:
> 
> Hello, All:
> 
> 
> GOOD NEWS AND BAD NEWS:
> 
> 
>       * First the good news:  I heard from Brian Ripley;  see below.  
> His web site says, "He retired in August 2014 on grounds of ill health." 
> (http://www.stats.ox.ac.uk/~ripley/)  I was pleased to see that he seems 
> to be well enough to send me the email below.
> 
> 
>       * BAD NEWS:  My Ecfun package is violating current CRAN rules 
> regarding "not writing anywhere in the file space".  (See below.)
> 
> 
> QUESTION:
> 
> 
>       How do you suggest I respond to this?
> 
> 
>       It's hard for me to fix, because I cannot replicate the error and 
> I don't understand the rules Prof. Ripley is trying to enforce. The 
> "CRAN Package Check Results for" this package show an error on 1 
> platform (r-devel-linux-x86_64-fedora-gcc), NOTEs on 3 platforms 
> (Fedora-clang and Debian), and "OK" on 9 others.  I can program selected 
> tests not to run on CRAN, e.g., with (!fda::CRAN()).
> 
> 
>       However, I suspect I should be able to do better than that.
> 
> 
>       Suggestions?
> 
> 
>       Thanks,
>       Spencer Graves
> 
> 
> p.s.  The development version of this package is available at 
> "https://github.com/sbgraves237/Ecfun".
> 
> 
> https://cloud.r-project.org/web/checks/check_results_Ecfun.html
> 
> 
> -------- Forwarded Message --------
> Subject: 	CRAN package Ecfun
> Date: 	Tue, 21 Jan 2020 21:26:02 +0000
> From: 	Prof Brian Ripley <ripley using stats.ox.ac.uk>
> Reply-To: 	CRAN <CRAN using r-project.org>
> To: 	Spencer Graves <spencer.graves using effectivedefense.org>
> CC: 	CRAN <CRAN using r-project.org>
> 
> 
> 
> This has been intermittently failing its checks for a week: different 
> check runs failed (in the 24h prior to) the 14th, 15th, 17th and today. 
> The current failure is
> 
> Check: examples
> Result: ERROR
> Running examples in ‘Ecfun-Ex.R’ failed
> The error most likely occurred in:
> 
>> ### Name: read.testURLs
>> ### Title: Read a file produced by testURLs
>> ### Aliases: read.testURLs
>> ### Keywords: IO
>> 
>> ### ** Examples
>> 
>> # Test only 2 web sites, not the default 4,
>> # and test only twice, not the default 10 times:
>> tst <- testURLs(c(
> + PVI="http://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index",
> + house="http://house.gov/representatives"),
> + n=2, maxFail=2)
> 1
> 1579634784, PVI, TRUE 0.828
> 1579634785, house, FALSE 0.051
> 1579634785, house, FALSE 0.048
> 2
> 1579634785, PVI, TRUE 0.043
> 1579634785, house, FALSE 0.11
> 1579634785, house, FALSE 0.035
>> 
>> # The above should have created a file 'testURLresults.csv'
>> # in the working directory. Read it.
>> 
>> dat <- read.testURLs()
> Error in read.table(file = file, header = header, sep = sep, quote = 
> quote, :
> more columns than column names
> Calls: read.testURLs -> read.csv -> read.table
> 
> That does not conform to the policy on Internet access, not least as no 
> attempt is made to check if the file was created, let alone that it has 
> the expected layout. Nor does it conform to the policy on not writing 
> anywhere in the file space (and that shows on its CRAN results page too).
> 
> Please correct ASAP and before Feb 4 to safely retain the package on CRAN.
> 
> -- 
> Brian D. Ripley,                  ripley using stats.ox.ac.uk
> Emeritus Professor of Applied Statistics, University of Oxford
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel

**********************
"The contents of this message do not reflect any position of the U.S. Government or NOAA."
**********************
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new street address***
110 McAllister Way
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: Roy.Mendelssohn using noaa.gov www: https://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected" 
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.



More information about the R-package-devel mailing list