[Rd] download.file does not process gz files correctly (truncates them?)
Martin Maechler
m@echler @ending from @t@t@m@th@ethz@ch
Fri May 4 09:06:15 CEST 2018
>>>>> Tomas Kalibera <tomas.kalibera at gmail.com>
>>>>> on Fri, 4 May 2018 08:34:03 +0200 writes:
> On 05/03/2018 11:14 PM, Henrik Bengtsson wrote:
>> Also, as mentioned in my
>> https://stat.ethz.ch/pipermail/r-devel/2012-August/064739.html,
>> when not specifying the mode argument, the default on
>> Windows is mode = "w" *except* for certain,
>> case-sensitive, filename extensions:
>>
>> if(missing(mode) && length(grep("\\.(gz|bz2|xz|tgz|zip|rda|RData)$", url)))
>> mode <- "wb"
>>
>> Just like the need for mode = "wb" on Windows, the above
>> special-file-extension-hack is only happening on Windows,
>> and is only documented in ?download.file if you're on
>> Windows; so someone who's on Linux/macOS trying to help
>> someone on Windows may not be aware of this. This adds to
>> even more confusions, e.g. "works for me".
> If we were designing the API today, it would probably make
> more sense not to convert any line endings by
> default. Today's editors _usually_ can cope with different
> line endings and it is probably easier to detect that a
> text file has incorrect line endings rather than detecting
> that a binary file has been corrupted by an attempt to
> convert line endings. But whether to change existing,
> documented behavior is a different question. In order to
> help users and programmers who do not read the
> documentation carefully we would create problems for users
> and programmers who do.
> The current heuristic/hack is in
> line with the compatibility approach: it detects files
> that are obviously binary, so it changes the default
> behavior only for cases when it would obviously cause
> damage.
> Tomas
Thank you, Tomas; I was about to say something similar but
probably less convincingly.
There's one thing I strongly agree with Henrik: The
only-on-Windows documented Windows behavior should be documented
on all platforms.
I'll update the help page,
and will also add the .rds extension to the above list
[ --- yes, we all should use saveRDS() and readRDS() whenever
sensible in favor of save() and load() ]
Martin
>> /Henrik
>>
>> On Thu, May 3, 2018 at 7:27 AM, Joris Meys
>> <jorismeys at gmail.com> wrote:
>>> Thank you Henrik and Martin for explaining what was
>>> going on. Very insightful!
>>>
>>> On Thu, May 3, 2018 at 4:21 PM, Jeroen Ooms
>>> <jeroenooms at gmail.com> wrote:
>>>> On Thu, May 3, 2018 at 2:42 PM, Henrik Bengtsson
>>>> <henrik.bengtsson at gmail.com> wrote:
>>>>> Use mode="wb" when you download the file. See
>>>>> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
>>>>>
>>>>> R core, and others, is there a good argument for why
>>>>> we are not making this the default download mode? It
>>>>> seems like a such a simple fix to such a common
>>>>> "mistake".
>>>> I'd like to second this feature request. This default
>>>> behaviour is unexpected and often leads to r scripts
>>>> that were written on mac/linux, to produce corrupted
>>>> files on windows, checksum mismatches, etc.
>>>>
>>>> Even for text files, the default should be to download
>>>> the file as-is. Trying to "fix" line-endings should be
>>>> opt-in, never the default. Downloading a file via a
>>>> browser or ftp client on windows also doesn't change
>>>> the file, why should R?
>>>
>>> I third the feature request.
>>>
>>>>
>>>>
>>>> On Thu, May 3, 2018 at 3:02 PM, Duncan Murdoch
>>>> <murdoch.duncan at gmail.com> wrote:
>>>>> Many downloads are text files (HTML, CSV, etc.), and
>>>>> if those are downloaded in binary, a Windows user
>>>>> might end up with a file that Notepad can't handle,
>>>>> because it would have Unix-style line endings.
>>>> True but I don't think this is relevant. The same holds
>>>> e.g. for the R files in source packages, which also
>>>> have unix line endings. Most Windows users will use an
>>>> actual editor that understands both types of line
>>>> endings, or can convert between the two.
>>>>
>>>> Downloading-file should do just that.
>>>
>>> Again, I agree. In my (limited) experience the only
>>> program that fails to properly display \n as a line
>>> ending, is Notepad. But it can still open the file
>>> regardless. If line ending conflicts cause bugs, it's
>>> almost always a unix-like OS struggling with
>>> Windows-style endings. I have yet to meet the first one
>>> the other way around.
>>>
>>> Cheers Joris
>>>
>>>
>>> --
>>> Joris Meys Statistical consultant
>>>
>>> Department of Data Analysis and Mathematical Modelling
>>> Ghent University Coupure Links 653, B-9000 Gent
>>> (Belgium)
>>>
>>> -----------
>>> Biowiskundedagen 2017-2018
>>> http://www.biowiskundedagen.ugent.be/
>>>
>>> -------------------------------
>>> Disclaimer :
>>> http://helpdesk.ugent.be/e-maildisclaimer.php
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list