[Rd] download.file() issue with pdf docs on Windows: Set mode="wb" automatically?

Andreas Blätte @ndre@@@b|@ette @end|ng |rom un|-due@de
Fri Jun 16 13:00:47 CEST 2023


Dear colleagues,

 

Windows users in an R course I teach encountered issues with downloading a pdf document with `download.file()` when trying to open it with `pdftools::pdf_info()`.

 

Indeed, on Windows pdf files downloaded using `download.file() are corrupted unless you set `mode="wb"`. This scenario is actually to be anticipated. The  documentation of download.file() says clearly:

  

"""

The choice of binary transfer (mode = "wb" or "ab") is important on Windows, since unlike Unix-alikes it does distinguish between text and binary files and for text transfers changes \n line endings to \r\n (aka ‘CRLF’).

 

On Windows, if mode is not supplied (missing()) and url ends in one of .gz, .bz2, .xz, .tgz, .zip, .jar, .rda, .rds or .RData, mode = "wb" is set so that a binary transfer is done to help unwary users.

 

Code written to download binary files must use mode = "wb" (or "ab"), but the problems incurred by a text transfer will only be seen on Windows.

"""

 

However, many "unwary users" will not read the (very clear) documentation. So I suggest to consider including pdf documents into the list of documents for which mode = "wb" is set automatically.

 

This would require to change this line of the R source code:

https://github.com/wch/r-source/blob/197d25ca9c5a5132dbc366667137ed11255c099b/src/library/utils/R/windows/download.file.R#L30

 

As follows:

if(missing(mode) && length(grep(\\.(gz|bz2|xz|tgz|zip|jar|rd[as]|RData|pdf)$, URLdecode(url))))

    mode <- "wb"

 

I am not sure whether you would see this as arbitrarily violating some logic. Yet I am quite sure that many users not used to reading the documentation carefully have struggled with this issue.

 

This is an issue I wrote for my course:

https://github.com/ablaette/learningR/issues/24

 

And this is the code that we used:

gruene_btw2021 <- "https://cms.gruene.de/uploads/documents/2021_Wahlprogrammentwurf.pdf"

gruene_btw2021_local <- tempfile()

download.file(url = gruene_btw2021, destfile = gruene_btw2021_local)

pdftools::pdf_info(gruene_btw2021_local)

 

 

Kind regards

Andreas

 

 

 

--

Prof. Dr. Andreas Blaette

Professor of Public Policy 

University of Duisburg-Essen 


	[[alternative HTML version deleted]]



More information about the R-devel mailing list