[Rd] URL checks
J C Nash
pro|jcn@@h @end|ng |rom gm@||@com
Mon Jan 11 15:41:49 CET 2021
Sorry, Martin, but I've NOT commented on this matter, unless someone has been impersonating me.
Someone else?
JN
On 2021-01-11 4:51 a.m., Martin Maechler wrote:
>>>>>> Viechtbauer, Wolfgang (SP)
>>>>>> on Fri, 8 Jan 2021 13:50:14 +0000 writes:
>
> > Instead of a separate file to store such a list, would it be an idea to add versions of the \href{}{} and \url{} markup commands that are skipped by the URL checks?
> > Best,
> > Wolfgang
>
> I think John Nash and you misunderstood -- or then I
> misunderstood -- the original proposal:
>
> I've been understanding that there should be a "central repository" of URL
> exceptions that is maintained by volunteers.
>
> And rather *not* that package authors should get ways to skip
> URL checking..
>
> Martin
>
>
> >> -----Original Message-----
> >> From: R-devel [mailto:r-devel-bounces using r-project.org] On Behalf Of Spencer
> >> Graves
> >> Sent: Friday, 08 January, 2021 13:04
> >> To: r-devel using r-project.org
> >> Subject: Re: [Rd] URL checks
> >>
> >> I also would be pleased to be allowed to provide "a list of known
> >> false-positive/exceptions" to the URL tests. I've been challenged
> >> multiple times regarding URLs that worked fine when I checked them. We
> >> should not be required to do a partial lobotomy to pass R CMD check ;-)
> >>
> >> Spencer Graves
> >>
> >> On 2021-01-07 09:53, Hugo Gruson wrote:
> >>>
> >>> I encountered the same issue today with https://astrostatistics.psu.edu/.
> >>>
> >>> This is a trust chain issue, as explained here:
> >>> https://whatsmychaincert.com/?astrostatistics.psu.edu.
> >>>
> >>> I've worked for a couple of years on a project to increase HTTPS
> >>> adoption on the web and we noticed that this type of error is very
> >>> common, and that website maintainers are often unresponsive to requests
> >>> to fix this issue.
> >>>
> >>> Therefore, I totally agree with Kirill that a list of known
> >>> false-positive/exceptions would be a great addition to save time to both
> >>> the CRAN team and package developers.
> >>>
> >>> Hugo
> >>>
> >>> On 07/01/2021 15:45, Kirill Müller via R-devel wrote:
> >>>> One other failure mode: SSL certificates trusted by browsers that are
> >>>> not installed on the check machine, e.g. the "GEANT Vereniging"
> >>>> certificate from https://relational.fit.cvut.cz/ .
> >>>>
> >>>> K
> >>>>
> >>>> On 07.01.21 12:14, Kirill Müller via R-devel wrote:
> >>>>> Hi
> >>>>>
> >>>>> The URL checks in R CMD check test all links in the README and
> >>>>> vignettes for broken or redirected links. In many cases this improves
> >>>>> documentation, I see problems with this approach which I have
> >>>>> detailed below.
> >>>>>
> >>>>> I'm writing to this mailing list because I think the change needs to
> >>>>> happen in R's check routines. I propose to introduce an "allow-list"
> >>>>> for URLs, to reduce the burden on both CRAN and package maintainers.
> >>>>>
> >>>>> Comments are greatly appreciated.
> >>>>>
> >>>>> Best regards
> >>>>>
> >>>>> Kirill
> >>>>>
> >>>>> # Problems with the detection of broken/redirected URLs
> >>>>>
> >>>>> ## 301 should often be 307, how to change?
> >>>>>
> >>>>> Many web sites use a 301 redirection code that probably should be a
> >>>>> 307. For example, https://www.oracle.com and https://www.oracle.com/
> >>>>> both redirect to https://www.oracle.com/index.html with a 301. I
> >>>>> suspect the company still wants oracle.com to be recognized as the
> >>>>> primary entry point of their web presence (to reserve the right to
> >>>>> move the redirection to a different location later), I haven't
> >>>>> checked with their PR department though. If that's true, the redirect
> >>>>> probably should be a 307, which should be fixed by their IT
> >>>>> department which I haven't contacted yet either.
> >>>>>
> >>>>> $ curl -i https://www.oracle.com
> >>>>> HTTP/2 301
> >>>>> server: AkamaiGHost
> >>>>> content-length: 0
> >>>>> location: https://www.oracle.com/index.html
> >>>>> ...
> >>>>>
> >>>>> ## User agent detection
> >>>>>
> >>>>> twitter.com responds with a 400 error for requests without a user
> >>>>> agent string hinting at an accepted browser.
> >>>>>
> >>>>> $ curl -i https://twitter.com/
> >>>>> HTTP/2 400
> >>>>> ...
> >>>>> <body>...<p>Please switch to a supported browser...</p>...</body>
> >>>>>
> >>>>> $ curl -s -i https://twitter.com/ -A "Mozilla/5.0 (X11; Ubuntu; Linux
> >>>>> x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" | head -n 1
> >>>>> HTTP/2 200
> >>>>>
> >>>>> # Impact
> >>>>>
> >>>>> While the latter problem *could* be fixed by supplying a browser-like
> >>>>> user agent string, the former problem is virtually unfixable -- so
> >>>>> many web sites should use 307 instead of 301 but don't. The above
> >>>>> list is also incomplete -- think of unreliable links, HTTP links,
> >>>>> other failure modes...
> >>>>>
> >>>>> This affects me as a package maintainer, I have the choice to either
> >>>>> change the links to incorrect versions, or remove them altogether.
> >>>>>
> >>>>> I can also choose to explain each broken link to CRAN, this subjects
> >>>>> the team to undue burden I think. Submitting a package with NOTEs
> >>>>> delays the release for a package which I must release very soon to
> >>>>> avoid having it pulled from CRAN, I'd rather not risk that -- hence I
> >>>>> need to remove the link and put it back later.
> >>>>>
> >>>>> I'm aware of https://github.com/r-lib/urlchecker, this alleviates the
> >>>>> problem but ultimately doesn't solve it.
> >>>>>
> >>>>> # Proposed solution
> >>>>>
> >>>>> ## Allow-list
> >>>>>
> >>>>> A file inst/URL that lists all URLs where failures are allowed --
> >>>>> possibly with a list of the HTTP codes accepted for that link.
> >>>>>
> >>>>> Example:
> >>>>>
> >>>>> https://oracle.com/ 301
> >>>>> https://twitter.com/drob/status/1224851726068527106 400
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list