[R-pkg-devel] URL checks

Ivan Krylov kry|ov@r00t @end|ng |rom gm@||@com
Thu Jun 30 10:36:43 CEST 2022


Greg,

I realise you are trying to solve the problem and I thank you for
trying to make the URL checks better for everyone. I probably sound
defeatist in my e-mails; sorry about that.

On Thu, 30 Jun 2022 17:49:49 +1000
Greg Hunt <greg using firmansyah.com> wrote:

> Do you have evidence that even without the use of HEAD that
> CloudFlare is rejecting the CRAN checks?

Unfortunately, yes, I think it's possible:

$ curl -v https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
# ...skipping TLS logs...
> GET /hc/en-us/articles/219949047-Installing-older-versions-of-packages HTTP/2 
> Host: support.rstudio.com
> User-Agent: curl/7.64.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 256)!
< HTTP/2 403
< date: Thu, 30 Jun 2022 08:13:01 GMT

CloudFlare blocks are probabilistic. I *think* the reason I got a 403
is because I didn't visit the page with my browser first. Switching
from HEAD to GET might also increase the traffic flow, leading to more
blocks from hosts not previously blocking the HEAD requests.

CloudFlare's suggested solution would be Private Access Tokens [*], but
that looks hard to implement (who would agree to sign those tokens?)
and leaves other CDNs.

> The CDN rejecting requests or flagging the service as temporarily
> unavailable when there is a communication failure with its upstream
> server is much the same behaviour that you would expect to see from
> the usual kinds of protection that you'd apply to a web server (some
> kind of filter/proxy/firewall) even without a CDN in place.

My point was different. If the upstream is actually down, the page
can't be served even to "valid" users, and the 503 error from
CloudFlare should fail the URL check. On the other hand, if the 503
error is due to the check tripping a bot detector, it could be
reasonable to give that page a free pass.

How can we distinguish those two situations? Could CloudFlare ask for a
CAPTCHA first, then realise that the upstream is down and return
another 503?

Yes, this is a sensitivity vs specificity question, and we can trade
some false positives (that we get now) for some false negatives
(letting a legitimate error status from a CDN pass the test) to make
life easier for package maintainers. Your suggestions are a step in the
right direction, but there has to be a way to make them less fragile.

-- 
Best regards,
Ivan

[*]
https://blog.cloudflare.com/eliminating-captchas-on-iphones-and-macs-using-new-standard/



More information about the R-package-devel mailing list