[Rd] read.table() fails with https in R 3.6 but not in R 3.5
Stephen Berman
@tephen@berm@n @end|ng |rom gmx@net
Sat May 4 19:04:05 CEST 2019
In versions of R prior to 3.6.0 the following invocation succeeds,
returning the data frame shown:
> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text", header=TRUE)
Dekade Anzahl
1 1900 11467254
2 1910 13023370
3 1920 13434601
4 1930 13296355
5 1940 12121250
6 1950 13191131
7 1960 10587420
8 1970 10944129
9 1980 11279439
10 1990 12052652
But in version 3.6.0 it fails:
> read.table("https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text", header=TRUE)
Error in file(file, "rt") :
cannot open the connection to 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text'
In addition: Warning message:
In file(file, "rt") :
cannot open URL 'https://www.dwds.de/r/stat?corpus=kern&cnt=tokens&date=decade&format=text': HTTP status was '403 Forbidden'
The table at this URL is generated by a query processor and the same
failure happens in 3.6.0 with other queries at this website. This
website does not appear to serve data via http: replacing https by http
in the above gives the same results, and in 3.6.0 the error message
contains the URL with http but in the warning message the URL is with
https. I have also tried a few other websites that serve
(non-generated) tabular data via https
(e.g. https://graphchallenge.s3.amazonaws.com/synthetic/gc3/Theory-16-25-81-Bk.tsv)
and with these read.table() succeeds in 3.6.0, so the problem isn't
https in general. Maybe it has to do with the page being generated
rather than static? There's only one reference to https in the 3.6.0
NEWS, concerning libcurl; I can't tell if it's relevant.
In case it matters, this is with R packaged for openSUSE, and I've found
the above difference between 3.5 and 3.6 on both openSUSE Leap 15.0 and
openSUSE Tumbleweed.
Steve Berman
More information about the R-devel
mailing list