[Rd] read.table() fails with https in R 3.6 but not in R 3.5

Stephen Berman @tephen@berm@n @end|ng |rom gmx@net
Mon May 6 14:27:17 CEST 2019


On Mon, 6 May 2019 11:12:25 +0200 Ralf Stubner <ralf.stubner using daqana.com> wrote:

> On 04.05.19 19:04, Stephen Berman wrote:
>> In versions of R prior to 3.6.0 the following invocation succeeds,
>> returning the data frame shown:
>>
>>> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text",
>>> header=TRUE)
>>    Dekade   Anzahl
>> 1    1900 11467254
>> 2    1910 13023370
>> 3    1920 13434601
>> 4    1930 13296355
>> 5    1940 12121250
>> 6    1950 13191131
>> 7    1960 10587420
>> 8    1970 10944129
>> 9    1980 11279439
>> 10   1990 12052652
>>
>> But in version 3.6.0 it fails:
>>
>>> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text",
>>> header=TRUE)
>> Error in file(file, "rt") :
>>   cannot open the connection to
>> 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text'
>> In addition: Warning message:
>> In file(file, "rt") :
>>   cannot open URL
>> 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text':
>> HTTP status was '403 Forbidden'
>
> I can reproduce the behavior on Debian using the CRAN supplied package
> for R 3.6.0. Trying to read the page with 'curl' produces also a 403
> error plus some HTML text (in German) explaining that I am treated as a
> 'robot' due to the supplied User-Agent (here: curl/7.52.1). One
> suggested solution is to adjust that value which does solve the issue:
>
>  > options(HTTPUserAgent='mozilla')

I confirm that works for me, too.  Thanks!  FWIW, the default value of
HTTPUserAgent in R 3.6 here is "R (3.6.0 x86_64-suse-linux-gnu x86_64
linux-gnu)", and using this (in R 3.6) fails as I reported, while the
default value of HTTPUserAgent in R 3.5 here is "R (3.5.0
x86_64-suse-linux-gnu x86_64 linux-gnu)" and using that (in R 3.5)
succeeds.  However, setting HTTPUserAgent in R 3.5 to "libcurl/7.60.0"
fails just as it does in 3.6.  It's not clear to me if this particular
website is being too restrictive or if R 3.6 should deal with it, or at
least mention the issue in NEWS or somewhere else.

Steve Berman



More information about the R-devel mailing list