[R-pkg-devel] Automated checking defeated by website anti-scraping rules

Sat Jun 14 02:38:46 CEST 2025

I have a package that throws the same NOTE when checked, the CRAN
maintainers just let it pass every time. I wouldn't worry about it.

On Fri, Jun 13, 2025, 20:35 Hugh Parsonage <hugh.parsonage using gmail.com> wrote:

> When checking a package on win-devel, I get the NOTE
>
> Found the following (possibly) invalid URLs:
>   URL:
> http://classic.austlii.edu.au/au/legis/cth/consol_act/itaa1997240/s4.10.html
>     From: man/small_business_tax_offset.Rd
>     Status: 410
>     Message: Gone
>   URL: http://classic.austlii.edu.au/au/legis/cth/consol_act/mla1986131/
>     From: man/medicare_levy.Rd
>     Status: 410
>     Message: Gone
>   URL: https://guides.dss.gov.au/social-security-guide/3/4/1/10
>     From: man/age_pension_age.Rd
>     Status: 403
>     Message: Forbidden
>
> The URLs exist (changing to https:// changes nothing) and are
> accessible from a browser just fine. They appear to have those HTTP
> statuses because of the servers' decision to block 'automated
> requests'. As imbecilic as these rules might be (they can probably be
> easily defeated), what should be the policy going forward?  I can wrap
> these URLs in \code{} to get past the checks, but a better solution
> might be available at the check stage.
>
> I think the fact that a check fails when a URL really has failed or
> moved is a good thing and should be preserved. I don't just want to
> get past the check.
>
>
> Hugh Parsonage.
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>

	[[alternative HTML version deleted]]