[Rd] download.file does not process gz files correctly (truncates them?)

Gabe Becker becker@g@be @ending from gene@com
Mon May 7 17:06:09 CEST 2018


Hey all,

I don't have a strong opinion about whether the default should ultimately
eventually change or not. Many people who use windows (a set which does not
include me) seem to think it would be better.

I will say that like Hugh, I'm strongly against making the argument
mandatory as an interim step. That is much less backwards compatible (ie it
will break much more existing code) than just changing the default would. I
would be for smarter heuristics, perhaps a warning, and eventually a change
instead if the change is ultimately decided on as the way forward.

Best,
~G

On Mon, May 7, 2018 at 5:32 AM, Hugh Parsonage <hugh.parsonage at gmail.com>
wrote:

> I'd add my support for mode = "wb" to (eventually) become the default,
> though I respect Tomas's comments about backwards-compatibility.
>
> Instead of making the argument mandatory (which would immediately
> break scripts -- even ones that won't be helped by changing to mode =
> 'wb') or otherwise changing behaviour, perhaps download.file could
> start to emit a message (not a warning) whenever the argument is
> missing on Windows. The message could say something like 'Using `mode
> = 'w'` which will corrupt non-text files. Set `mode = 'wb'` for binary
> downloads or see the help page for other options.' Emitting a message
> has the lightest impact on existing scripts, while alerting new users
> to future mistakes.
>
> On 7 May 2018 at 18:49, Joris Meys <jorismeys at gmail.com> wrote:
> > Martin, also from me a heartfelt thank you for taking care of this. Some
> > thoughts on Henrik's response:
> >
> > On Mon, May 7, 2018 at 2:28 AM, Henrik Bengtsson <
> henrik.bengtsson at gmail.com
> >> wrote:
> >
> >>
> >> I still argue that the current behavior cause more harm than it helps.
> >>
> >
> > I agree with your analysis of the problems this legacy behaviour causes.
> >
> > Deprecating the default mode="w" on Windows can be done in steps, e.g.
> >> by making the argument mandatory for a while. This could be done on
> >> all platforms because we're already all affected, i.e. we need to
> >> specify 'mode' to avoid surprises.
> >>
> >
> > That sounds like a reasonable way to move away from this discrepancy
> > between OS.
> >
> >
> >> What about case-insensitive matching, e.g. data.ZIP and data.Rdata?
> >>
> >
> > Totally agree, and easily solved by eg adding ignore.case = TRUE to the
> > grep() call.
> >
> >
> >> A quick scan of the R source code suggests that R is also working with
> >> the following filename extensions (using various case styles):
> >>
> >> What about all the other file extensions that we know for sure are
> binary?
> >>
> >
> > If the default isn't changed, doesn't it make more sense to actually turn
> > the logic around? Text files that are downloaded over the internet are
> > almost always .txt, .csv, or a few other extensions used for text data .
> > Those are actually the only files where some people with very old Windows
> > programs for text processing can get into trouble. So instead of adding
> > every possible binary extension, one can put "wb" as default and change
> to
> > "w" if it is a text file instead of the other way around. That would not
> > change the concept of the behaviour, but ensures that the function
> doesn't
> > fail to detect a binary file. Not detecting a text file is far less of a
> > problem, as not converting the line endings doesn't destruct the file.
> >
> > Cheers
> > Joris
> >
> > --
> > Joris Meys
> > Statistical consultant
> >
> > Department of Data Analysis and Mathematical Modelling
> > Ghent University
> > Coupure Links 653, B-9000 Gent (Belgium)
> > <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-
> 9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
> >
> > -----------
> > Biowiskundedagen 2017-2018
> > http://www.biowiskundedagen.ugent.be/
> >
> > -------------------------------
> > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Gabriel Becker, Ph.D
Scientist
Bioinformatics and Computational Biology
Genentech Research

	[[alternative HTML version deleted]]



More information about the R-devel mailing list