[R] download.file() problems with binary files containing EOF byte in Windows
Scott Sherrill-Mix
@he@cott @ending from pennmedicine@upenn@edu
Mon Aug 20 21:42:22 CEST 2018
Hello,
I'm trying to get a package to pass win-builder and have been having a
bit of trouble with Windows R and binary files (in my case a small
.tar.gz used in testing). After a little debugging, I think I've
narrowed it down to download.file() truncating files to the first '1a'
byte (often used for EOF but I think a valid byte inside gzip files)
on downloads from local "file://xxx". I'm trying to figure out if this
is a known "feature" of Windows that I should just avoid or does this
seem like a bug?
For example:
#write a file starting with byte 1a (decimal 26)
writeBin(26:100,'tmp.bin',size=1)
download.file('file://tmp.bin','download.bin')
file.size('tmp.bin')
file.size('download.bin')
On Windows (session info below), I get file sizes of 75 and 0 and on
Linux I get 75 and 75.
As a more real world example, if I download.file() on a .gz file then
a remote download seems to return different size files from a local
download. For example for a gz file from a google hit about gzip
(http://commandlinefanatic.com/cgi-bin/showarticle.cgi?article=art053):
download.file('http://commandlinefanatic.com/gunzip.c.gz','gunzip.c.gz')
download.file('file://gunzip.c.gz','dl.gz')
file.size('gunzip.c.gz')
file.size('dl.gz')
I get a 4704 byte file for the remote download and 360 for the local
download in Windows (versus 4704 and 4704 on Linux). Note that the
361st byte is 1a:
readBin('gunzip.c.gz','raw',361)
The various download.file options don't seem to fix this with the same 360 bytes
for:
download.file('file://gunzip.c.gz','dl.gz',mode='wb')
file.size('dl.gz')
download.file('file://gunzip.c.gz','dl.gz',mode='wb',method='internal')
file.size('dl.gz')
It looks like the 'auto' and 'internal' methods both resolve to the
'wininet' method on Windows and mode is automatically set to 'wb' for
gz files so maybe not surprising those don't change things.
Thanks,
Scott
## Windows sessionInfo():
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.1
## Linux sessionInfo():
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.4.4
More information about the R-help
mailing list