[Rd] download.file does not process gz files correctly (truncates them?)

Hugh Parsonage hugh@p@r@on@ge @ending from gm@il@com
Mon May 7 14:32:36 CEST 2018


I'd add my support for mode = "wb" to (eventually) become the default,
though I respect Tomas's comments about backwards-compatibility.

Instead of making the argument mandatory (which would immediately
break scripts -- even ones that won't be helped by changing to mode =
'wb') or otherwise changing behaviour, perhaps download.file could
start to emit a message (not a warning) whenever the argument is
missing on Windows. The message could say something like 'Using `mode
= 'w'` which will corrupt non-text files. Set `mode = 'wb'` for binary
downloads or see the help page for other options.' Emitting a message
has the lightest impact on existing scripts, while alerting new users
to future mistakes.

On 7 May 2018 at 18:49, Joris Meys <jorismeys at gmail.com> wrote:
> Martin, also from me a heartfelt thank you for taking care of this. Some
> thoughts on Henrik's response:
>
> On Mon, May 7, 2018 at 2:28 AM, Henrik Bengtsson <henrik.bengtsson at gmail.com
>> wrote:
>
>>
>> I still argue that the current behavior cause more harm than it helps.
>>
>
> I agree with your analysis of the problems this legacy behaviour causes.
>
> Deprecating the default mode="w" on Windows can be done in steps, e.g.
>> by making the argument mandatory for a while. This could be done on
>> all platforms because we're already all affected, i.e. we need to
>> specify 'mode' to avoid surprises.
>>
>
> That sounds like a reasonable way to move away from this discrepancy
> between OS.
>
>
>> What about case-insensitive matching, e.g. data.ZIP and data.Rdata?
>>
>
> Totally agree, and easily solved by eg adding ignore.case = TRUE to the
> grep() call.
>
>
>> A quick scan of the R source code suggests that R is also working with
>> the following filename extensions (using various case styles):
>>
>> What about all the other file extensions that we know for sure are binary?
>>
>
> If the default isn't changed, doesn't it make more sense to actually turn
> the logic around? Text files that are downloaded over the internet are
> almost always .txt, .csv, or a few other extensions used for text data .
> Those are actually the only files where some people with very old Windows
> programs for text processing can get into trouble. So instead of adding
> every possible binary extension, one can put "wb" as default and change to
> "w" if it is a text file instead of the other way around. That would not
> change the concept of the behaviour, but ensures that the function doesn't
> fail to detect a binary file. Not detecting a text file is far less of a
> problem, as not converting the line endings doesn't destruct the file.
>
> Cheers
> Joris
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
> <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
>
> -----------
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list