[Rd] download.file does not process gz files correctly (truncates them?)

Duncan Murdoch murdoch@dunc@n @ending from gm@il@com
Thu May 3 15:02:23 CEST 2018


On 03/05/2018 8:42 AM, Henrik Bengtsson wrote:
> Use mode="wb" when you download the file. See
> https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
> 
> R core, and others, is there a good argument for why we are not making this
> the default download mode? It seems like a such a simple fix to such a
> common "mistake".

Many downloads are text files (HTML, CSV, etc.), and if those are 
downloaded in binary, a Windows user might end up with a file that 
Notepad can't handle, because it would have Unix-style line endings.
(It's possible Notepad no longer requires CR LF endings; I haven't used 
it in years.  But there are probably other brain-dead Windows programs 
that do.)

Duncan Murdoch


> 
> Henrik
> 
> On Thu, May 3, 2018, 00:44 Joris Meys <jorismeys at gmail.com> wrote:
> 
>> Dear all,
>>
>> I've noticed by trying to download gz files from here :
>> https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811
>>
>> At the bottom one can download GSM907811.CEL.gz . If I download this
>> manually and try
>>
>> oligo::read.celfiles("GSM907811.CEL.gz")
>>
>> everything works fine. (oligo is a bioConductor package)
>>
>> However, if I download using
>>
>> download.file("
>>
>> https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
>> ",
>>                destfile = "GSM907811.CEL.gz")
>>
>> The file is downloaded, but oligo::read.celfiles() returns the following
>> error:
>>
>> Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
>>    End of gz file reached unexpectedly. Perhaps this file is truncated.
>>
>> Moreover, if I try to delete it after using download.file(), I get a
>> warning that permission is denied. I can only remove it using Windows file
>> explorer after I closed the R session, indicating that the connection is
>> still open. Yet, showConnections() doesn't show any open connections
>> either.
>>
>> Session info below. Note that I started from a completely fresh R session.
>> oligo is needed due to the specific file format of these gz files. They're
>> not standard tarred files.
>>
>> Cheers
>> Joris
>>
>> Session Info
>>
>> -------------------------------------------------------------------------------------
>>
>> R version 3.5.0 (2018-04-23)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>> Running under: Windows >= 8 x64 (build 9200)
>>
>> Matrix products: default
>>
>> locale:
>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>> Kingdom.1252
>> [3] LC_MONETARY=English_United Kingdom.1252
>> LC_NUMERIC=C
>> [5] LC_TIME=English_United Kingdom.1252
>>
>> attached base packages:
>> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
>> methods
>> [9] base
>>
>> other attached packages:
>>   [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
>> oligo_1.44.0
>>   [4] Biobase_2.39.2             oligoClasses_1.42.0
>> RSQLite_2.1.0
>>   [7] Biostrings_2.48.0          XVector_0.19.9
>> IRanges_2.13.28
>> [10] S4Vectors_0.17.42          BiocGenerics_0.25.3
>>
>> loaded via a namespace (and not attached):
>>   [1] Rcpp_0.12.16                compiler_3.5.0
>>   [3] BiocInstaller_1.30.0        GenomeInfoDb_1.15.5
>>   [5] bitops_1.0-6                iterators_1.0.9
>>   [7] tools_3.5.0                 zlibbioc_1.25.0
>>   [9] digest_0.6.15               bit_1.1-12
>> [11] memoise_1.1.0               preprocessCore_1.41.0
>> [13] lattice_0.20-35             ff_2.2-13
>> [15] pkgconfig_2.0.1             Matrix_1.2-14
>> [17] foreach_1.4.4               DelayedArray_0.5.31
>> [19] yaml_2.1.18                 GenomeInfoDbData_1.1.0
>> [21] affxparser_1.52.0           bit64_0.9-7
>> [23] grid_3.5.0                  BiocParallel_1.13.3
>> [25] blob_1.1.1                  codetools_0.2-15
>> [27] matrixStats_0.53.1          GenomicRanges_1.31.23
>> [29] splines_3.5.0               SummarizedExperiment_1.9.17
>> [31] RCurl_1.95-4.10             affyio_1.49.2
>>
>>
>> --
>> Joris Meys
>> Statistical consultant
>>
>> Department of Data Analysis and Mathematical Modelling
>> Ghent University
>> Coupure Links 653, B-9000 Gent (Belgium)
>> <
>> https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g
>>>
>>
>> -----------
>> Biowiskundedagen 2017-2018
>> http://www.biowiskundedagen.ugent.be/
>>
>> -------------------------------
>> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>>
>>          [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>




More information about the R-devel mailing list