[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

Winston Chang winstonchang1 at gmail.com
Wed Jun 22 18:01:39 CEST 2016


Thanks for looking into it. Is there a way to avoid the HEAD request
in R 3.3.0? I'm asking because if there isn't, then I'll add a
workaround in a package I'm working on.

-Winston

On Tue, Jun 21, 2016 at 9:45 PM, Martin Morgan
<martin.morgan at roswellpark.org> wrote:
> On 06/21/2016 09:35 PM, Winston Chang wrote:
>>
>> In R 3.2.4, if you ran download.file(method="libcurl"), it issues a
>> HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP
>> HEAD request first, and then a GET requet. This can result in problems
>> when the web server gives an error for a HEAD request, even if the
>> file is available with a GET request.
>>
>> Is it possible to tell download.file to simply send a GET request,
>> without first sending a HEAD request?
>>
>>
>> In theory, web servers should give the same response for HEAD and GET
>> requests, except that for a HEAD request, it sends only headers, and
>> not the content. However, not all web servers do this for all files.
>> I've seen this problem come up in two different places.
>>
>> The first is from an issue that someone filed for the downloader
>> package. The following works in R 3.2.4, but in R 3.3.0, it fails with
>> a 404 (tested on a Mac):
>>    options(internet.info=1) # Show verbose download info
>>    url <-
>> "https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011_gen.zip"
>>   download.file(url, destfile = "out.zip", method="libcurl")
>>
>> In R 3.3.0, the download succeeds with method="wget", and
>> method="curl". It's only method="libcurl" that has problems.
>>
>>
>> The second place I've encountered a problem is in downloading attached
>> files from a GitHub release.
>>    options(internet.info=1) # Show verbose download info
>>    url <-
>> "https://github.com/wch/webshot/releases/download/v0.3/phantomjs-2.1.1-macosx.zip"
>>    download.file(url, destfile = "out.zip")
>>
>> This one fails with a 403 Forbidden because it gets redirected to a
>> URL in Amazon S3, where a signature of the file is embedded in the
>> URL. However, the signature is computed with the request type (HEAD
>> vs. GET), and so the same URL doesn't work for both. (See
>> http://stackoverflow.com/a/20580036/412655)
>>
>> Any help would be appreciated!
>
>
> I think I introduced this, in
>
> ------------------------------------------------------------------------
> r69280 | morgan | 2015-09-03 06:24:49 -0400 (Thu, 03 Sep 2015) | 4 lines
>
> don't create empty file on 404 and similar errors
>
> - download.file(method="libcurl")
>
> ------------------------------------------------------------------------
>
> The idea was to test that the file can be downloaded before trying to
> download it; previously R would download the error page as though it were
> the content.
>
> I'll give this some thought.
>
> Martin Morgan
>
>
>> -Winston
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.



More information about the R-devel mailing list