[Rd] download.file does not process gz files correctly (truncates them?)
Tomas Kalibera
tom@@@k@liber@ @ending from gm@il@com
Fri May 4 08:34:03 CEST 2018
On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:
> Also, as mentioned in my
> https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html, when
> not specifying the mode argument, the default on Windows is mode = "w"
> *except* for certain, case-sensitive, filename extensions:
>
> if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", url)))
> mode <- "wb"
>
> Just like the need for mode = "wb" on Windows, the above
> special-file-extension-hack is only happening on Windows, and is only
> documented in ?download.file if you're on Windows; so someone who's on
> Linux/macOS trying to help someone on Windows may not be aware of
> this. This adds to even more confusions, e.g. "works for me".
If we were designing the API today, it would probably make more sense
not to convert any line endings by default. Today's editors _usually_
can cope with different line endings and it is probably easier to detect
that a text file has incorrect line endings rather than detecting that a
binary file has been corrupted by an attempt to convert line endings.
But whether to change existing, documented behavior is a different
question. In order to help users and programmers who do not read the
documentation carefully we would create problems for users and
programmers who do. The current heuristic/hack is in line with the
compatibility approach: it detects files that are obviously binary, so
it changes the default behavior only for cases when it would obviously
cause damage.
Tomas
>
> /Henrik
>
> On Thu, May 3, 2018 at 7:27 AM, Joris Meys <jorismeys at gmail.com> wrote:
>> Thank you Henrik and Martin for explaining what was going on. Very
>> insightful!
>>
>> On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms <jeroenooms at gmail.com> wrote:
>>> On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
>>> <henrik.bengtsson at gmail.com> wrote:
>>>> Use mode="wb" when you download the file. See
>>>> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
>>>>
>>>> R core, and others, is there a good argument for why we are not making
>>>> this
>>>> the default download mode? It seems like a such a simple fix to such a
>>>> common "mistake".
>>> I'd like to second this feature request. This default behaviour is
>>> unexpected and often leads to r scripts that were written on
>>> mac/linux, to produce corrupted files on windows, checksum mismatches,
>>> etc.
>>>
>>> Even for text files, the default should be to download the file as-is.
>>> Trying to "fix" line-endings should be opt-in, never the default.
>>> Downloading a file via a browser or ftp client on windows also doesn't
>>> change the file, why should R?
>>
>> I third the feature request.
>>
>>>
>>>
>>> On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch <murdoch.duncan at gmail.com>
>>> wrote:
>>>> Many downloads are text files (HTML, CSV, etc.), and if those are
>>>> downloaded
>>>> in binary, a Windows user might end up with a file that Notepad can't
>>>> handle, because it would have Unix-style line endings.
>>> True but I don't think this is relevant. The same holds e.g. for the R
>>> files in source packages, which also have unix line endings. Most
>>> Windows users will use an actual editor that understands both types of
>>> line endings, or can convert between the two.
>>>
>>> Downloading-file should do just that.
>>
>> Again, I agree. In my (limited) experience the only program that fails to
>> properly display \n as a line ending, is Notepad. But it can still open the
>> file regardless. If line ending conflicts cause bugs, it's almost always a
>> unix-like OS struggling with Windows-style endings. I have yet to meet the
>> first one the other way around.
>>
>> Cheers
>> Joris
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Department of Data Analysis and Mathematical Modelling
>> Ghent University
>> Coupure Links 653, B-9000 Gent (Belgium)
>>
>> -----------
>> Biowiskundedagen 2017-2018
>> http://www.biowiskundedagen.ugent.be/
>>
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list