[Rd] download.file does not process gz files correctly (truncates them?)

Hadley Wickham h@wickh@m @ending from gm@il@com
Tue May 8 22:47:57 CEST 2018


On Tue, May 8, 2018 at 8:15 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
> On Thu, May 3, 2018 at 11:34 PM, Tomas Kalibera
> <tomas.kalibera at gmail.com> wrote:
>> On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:
>>>
>>> Also, as mentioned in my
>>> https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
>>> not specifying the mode argument, the default on Windows is mode = "w"
>>> *except* for certain, case-sensitive, filename extensions:
>>>
>>>      if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$",
>>> url)))
>>>          mode <- "wb"
>>>
>>> Just like the need for mode = "wb" on Windows, the above
>>> special-file-extension-hack is only happening on Windows, and is only
>>> documented in ?download.file if you're on Windows; so someone who's on
>>> Linux/macOS trying to help someone on Windows may not be aware of
>>> this. This adds to even more confusions, e.g. "works for me".
>>
>> If we were designing the API today, it would probably make more sense not to
>> convert any line endings by default. Today's editors _usually_ can cope with
>> different line endings and it is probably easier to detect that a text file
>> has incorrect line endings rather than detecting that a binary file has been
>> corrupted by an attempt to convert line endings. But whether to change
>> existing, documented behavior is a different question. In order to help
>> users and programmers who do not read the documentation carefully we would
>> create problems for users and programmers who do. The current heuristic/hack
>> is in line with the compatibility approach: it detects files that are
>> obviously binary, so it changes the default behavior only for cases when it
>> would obviously cause damage.
>
> From a purely utilitarian standpoint, there are far more users who do
> not carefully read the documentation than users who do ;)
>
> (I'd also argue that basing the decision on the file extension is
> suboptimal, and it would be better to use the mime type if provided by
> the server)

Also note that MS just announced support for unix line endings in notepad

https://blogs.msdn.microsoft.com/commandline/2018/05/08/extended-eol-in-notepad/

Hadley

-- 
http://hadley.nz



More information about the R-devel mailing list