[R-pkg-devel] URL checks

Greg Hunt greg @end|ng |rom ||rm@n@y@h@com
Thu Jun 30 01:56:28 CEST 2022


Ben,
It looks like, from the link you supplied, for doi.org URLs that if you
also remove the -L from the curl invocation and check the HTTP response
(302) then you know that the target ID is valid because it is documented to
return a not found status (404) for non-existing IDs (and indeed thats what
it did when I faked an ID).

Greg

On Thu, 30 Jun 2022 at 09:45, Ben Bolker <bbolker using gmail.com> wrote:

>    I seem to recall that someone else suggested a related solution to me
> (using something from
> https://www.doi.org/doi_handbook/3_Resolution.html#3.8.1 to get
> information about validity without actually trying to redirect?);
> unfortunately, I can't dig out either their name or their specific
> suggestion at the moment.
>
>    Ben Bolker
>
> On 6/29/22 7:34 PM, Greg Hunt wrote:
> > With a little experimentation, the problem seems to be the -I switch in
> > curl, with which the request uses the HTTP HEAD method instead of GET.
> > Without -I, the requests from curl work for me; with -I, I get negative
> > responses (403, 503).
> >
> > While HEAD does not represent a major security threat to a server it gets
> > caught up when people are disabling unused or unnecessary operations and
> > features and so at a first approximation, the problem is -I.  Now, there
> > may also be scraper blocking applied to the CRAN and WinBuilder
> > infrastructure by the CDN companies, because scraping is a large problem
> > for many websites, but detecting rejection by cloudflare may be possible
> if
> > that is what is happening.
> >
> >
> > On Thu, 30 Jun 2022 at 07:08, Ivan Krylov <krylov.r00t using gmail.com> wrote:
> >
> >> On Wed, 29 Jun 2022 22:51:23 +0200 (CEST)
> >> William Becker <william.becker using bluefoxdata.eu> wrote:
> >>
> >>> if someone can point me to a reference where I can work out how to
> >>> solve the problem, that would be really helpful
> >>
> >> The CRAN URL checks page is linked from the CRAN policy:
> >> https://cran.r-project.org/web/packages/URL_checks.html
> >>
> >> Short version is, your links all seem fine, but Cloudflare and other
> >> content distribution companies don't like the way R checks them. You
> >> only need to mention that in the submission comment.
> >>
> >> There's no technical fix for the problem. DOI checks could be in theory
> >> adjusted (trading false negatives for false positives), but it's hard to
> >> get checks working for the rest of the links when the CDN companies
> >> decide that automated requests like ones produced by R CMD check are
> >> exactly the kind of thing they should be blocking.
> >>
> >> --
> >> Best regards,
> >> Ivan
> >>
> >> ______________________________________________
> >> R-package-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-package-devel
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-package-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> --
> Dr. Benjamin Bolker
> Professor, Mathematics & Statistics and Biology, McMaster University
> Director, School of Computational Science and Engineering
> Graduate chair, Mathematics & Statistics
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]



More information about the R-package-devel mailing list