[R-pkg-devel] Automated checking defeated by website anti-scraping rules
    Hugh Parsonage 
    hugh@p@r@on@ge @end|ng |rom gm@||@com
       
    Sat Jun 14 02:33:55 CEST 2025
    
    
  
When checking a package on win-devel, I get the NOTE
Found the following (possibly) invalid URLs:
  URL: http://classic.austlii.edu.au/au/legis/cth/consol_act/itaa1997240/s4.10.html
    From: man/small_business_tax_offset.Rd
    Status: 410
    Message: Gone
  URL: http://classic.austlii.edu.au/au/legis/cth/consol_act/mla1986131/
    From: man/medicare_levy.Rd
    Status: 410
    Message: Gone
  URL: https://guides.dss.gov.au/social-security-guide/3/4/1/10
    From: man/age_pension_age.Rd
    Status: 403
    Message: Forbidden
The URLs exist (changing to https:// changes nothing) and are
accessible from a browser just fine. They appear to have those HTTP
statuses because of the servers' decision to block 'automated
requests'. As imbecilic as these rules might be (they can probably be
easily defeated), what should be the policy going forward?  I can wrap
these URLs in \code{} to get past the checks, but a better solution
might be available at the check stage.
I think the fact that a check fails when a URL really has failed or
moved is a good thing and should be preserved. I don't just want to
get past the check.
Hugh Parsonage.
    
    
More information about the R-package-devel
mailing list