[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests
Winston Chang
winstonchang1 at gmail.com
Wed Jun 22 03:35:30 CEST 2016
In R 3.2.4, if you ran download.file(method="libcurl"), it issues a
HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP
HEAD request first, and then a GET requet. This can result in problems
when the web server gives an error for a HEAD request, even if the
file is available with a GET request.
Is it possible to tell download.file to simply send a GET request,
without first sending a HEAD request?
In theory, web servers should give the same response for HEAD and GET
requests, except that for a HEAD request, it sends only headers, and
not the content. However, not all web servers do this for all files.
I've seen this problem come up in two different places.
The first is from an issue that someone filed for the downloader
package. The following works in R 3.2.4, but in R 3.3.0, it fails with
a 404 (tested on a Mac):
options(internet.info=1) # Show verbose download info
url <- "https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011_gen.zip"
download.file(url, destfile = "out.zip", method="libcurl")
In R 3.3.0, the download succeeds with method="wget", and
method="curl". It's only method="libcurl" that has problems.
The second place I've encountered a problem is in downloading attached
files from a GitHub release.
options(internet.info=1) # Show verbose download info
url <- "https://github.com/wch/webshot/releases/download/v0.3/phantomjs-2.1.1-macosx.zip"
download.file(url, destfile = "out.zip")
This one fails with a 403 Forbidden because it gets redirected to a
URL in Amazon S3, where a signature of the file is embedded in the
URL. However, the signature is computed with the request type (HEAD
vs. GET), and so the same URL doesn't work for both. (See
http://stackoverflow.com/a/20580036/412655)
Any help would be appreciated!
-Winston
More information about the R-devel
mailing list