[Rd] download.file does not process gz files correctly (truncates them?)

Duncan Murdoch murdoch@dunc@n @ending from gm@il@com
Wed May 9 14:52:19 CEST 2018

On 08/05/2018 4:47 PM, Hadley Wickham wrote:
> On Tue, May 8, 2018 at 8:15 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
>> On Thu, May 3, 2018 at 11:34 PM, Tomas Kalibera
>> <tomas.kalibera at gmail.com> wrote:
>>> On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:
>>>> Also, as mentioned in my
>>>> https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
>>>> not specifying the mode argument, the default on Windows is mode = "w"
>>>> *except* for certain, case-sensitive, filename extensions:
>>>>       if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$",
>>>> url)))
>>>>           mode <- "wb"
>>>> Just like the need for mode = "wb" on Windows, the above
>>>> special-file-extension-hack is only happening on Windows, and is only
>>>> documented in ?download.file if you're on Windows; so someone who's on
>>>> Linux/macOS trying to help someone on Windows may not be aware of
>>>> this. This adds to even more confusions, e.g. "works for me".
>>> If we were designing the API today, it would probably make more sense not to
>>> convert any line endings by default. Today's editors _usually_ can cope with
>>> different line endings and it is probably easier to detect that a text file
>>> has incorrect line endings rather than detecting that a binary file has been
>>> corrupted by an attempt to convert line endings. But whether to change
>>> existing, documented behavior is a different question. In order to help
>>> users and programmers who do not read the documentation carefully we would
>>> create problems for users and programmers who do. The current heuristic/hack
>>> is in line with the compatibility approach: it detects files that are
>>> obviously binary, so it changes the default behavior only for cases when it
>>> would obviously cause damage.
>>  From a purely utilitarian standpoint, there are far more users who do
>> not carefully read the documentation than users who do ;)
>> (I'd also argue that basing the decision on the file extension is
>> suboptimal, and it would be better to use the mime type if provided by
>> the server)
> Also note that MS just announced support for unix line endings in notepad
> https://blogs.msdn.microsoft.com/commandline/2018/05/08/extended-eol-in-notepad/

Perhaps soon RStudio will follow Notepad's lead, and not convert line 
endings when it saves a non-native file.

Duncan Murdoch

More information about the R-devel mailing list