[Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

Thu Aug 27 01:07:23 CEST 2015

On 26/08/2015 6:04 PM, Jeroen Ooms wrote:
> On Tue, Aug 25, 2015 at 10:33 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>>
>> actually I don't know that it does -- it addresses the symptom but I think there should be an error from libcurl on the 403 / 404 rather than from read.dcf on error page...
> 
> Indeed, the only correct behavior is to turn the protocol error code
> into an R exception. When the server returns a status code >= 400, it
> indicates that the request was unsuccessful and the response body does
> not contain the content the client had requested, but should instead
> be interpreted as an error message/page. Ignoring this fact and
> proceeding with parsing the body as usual is incorrect and leads to
> all kind of strange errors downstream.

Yes.  I haven't been following this long thread.  Is it only in R-devel,
or is this happening in 3.2.2 or R-patched?

If the latter, please submit a bug report.  If it is only R-devel,
please just be patient.  When R-devel becomes R-alpha next year, if the
bug still exists, please report it.

Duncan Murdoch

> 
> The other download methods did this correctly, it is unclear why the
> current implementation of the "libcurl" method does not. Not only does
> it lead to hard to interpret downstream parsing errors, it also makes
> the behavior of R ambiguous as it is dependent on which download
> method is in use. It is certainly not a limitation of the libcurl
> library: the 'curl' package has alternative implementations of url()
> and download.file() which exercise the correct behavior.
> 
> I can only speculate, but if the motivation is to explicitly support
> retrieval of error pages, perhaps the download.file() and url()
> functions can gain an argument 'stop_on_error' or something similar
> which give the user an option to ignore server errors. However this
> behavior should certainly not be the default. When a function or
> script contains a line like this:
> 
>   download.file("https://someserver.com/mydata.csv", "mydata.csv")
> 
> Then in the next line of code we must be able to expect that the file
> "mydata.csv" we have downloaded to our disk is in fact the file
> "mydata.csv" that was requested from the server. An implementation
> that instead saves an error page (likely html content) to the
> "mydata.csv" file is simply incorrect and will lead to obvious
> problems, even with a warning.
> 
> 
> [1] https://www.opencpu.org/posts/cran-https/
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>