[Rd] download.file does not process gz files correctly (truncates them?)
Henrik Bengtsson
henrik@bengt@@on @ending from gm@il@com
Thu May 3 14:42:08 CEST 2018
Use mode="wb" when you download the file. See
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/30.
R core, and others, is there a good argument for why we are not making this
the default download mode? It seems like a such a simple fix to such a
common "mistake".
Henrik
On Thu, May 3, 2018, 00:44 Joris Meys <jorismeys at gmail.com> wrote:
> Dear all,
>
> I've noticed by trying to download gz files from here :
> https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811
>
> At the bottom one can download GSM907811.CEL.gz . If I download this
> manually and try
>
> oligo::read.celfiles("GSM907811.CEL.gz")
>
> everything works fine. (oligo is a bioConductor package)
>
> However, if I download using
>
> download.file("
>
> https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
> ",
> destfile = "GSM907811.CEL.gz")
>
> The file is downloaded, but oligo::read.celfiles() returns the following
> error:
>
> Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
> End of gz file reached unexpectedly. Perhaps this file is truncated.
>
> Moreover, if I try to delete it after using download.file(), I get a
> warning that permission is denied. I can only remove it using Windows file
> explorer after I closed the R session, indicating that the connection is
> still open. Yet, showConnections() doesn't show any open connections
> either.
>
> Session info below. Note that I started from a completely fresh R session.
> oligo is needed due to the specific file format of these gz files. They're
> not standard tarred files.
>
> Cheers
> Joris
>
> Session Info
>
> -------------------------------------------------------------------------------------
>
> R version 3.5.0 (2018-04-23)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows >= 8 x64 (build 9200)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252
> LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils datasets
> methods
> [9] base
>
> other attached packages:
> [1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
> oligo_1.44.0
> [4] Biobase_2.39.2 oligoClasses_1.42.0
> RSQLite_2.1.0
> [7] Biostrings_2.48.0 XVector_0.19.9
> IRanges_2.13.28
> [10] S4Vectors_0.17.42 BiocGenerics_0.25.3
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.16 compiler_3.5.0
> [3] BiocInstaller_1.30.0 GenomeInfoDb_1.15.5
> [5] bitops_1.0-6 iterators_1.0.9
> [7] tools_3.5.0 zlibbioc_1.25.0
> [9] digest_0.6.15 bit_1.1-12
> [11] memoise_1.1.0 preprocessCore_1.41.0
> [13] lattice_0.20-35 ff_2.2-13
> [15] pkgconfig_2.0.1 Matrix_1.2-14
> [17] foreach_1.4.4 DelayedArray_0.5.31
> [19] yaml_2.1.18 GenomeInfoDbData_1.1.0
> [21] affxparser_1.52.0 bit64_0.9-7
> [23] grid_3.5.0 BiocParallel_1.13.3
> [25] blob_1.1.1 codetools_0.2-15
> [27] matrixStats_0.53.1 GenomicRanges_1.31.23
> [29] splines_3.5.0 SummarizedExperiment_1.9.17
> [31] RCurl_1.95-4.10 affyio_1.49.2
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
> <
> https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g
> >
>
> -----------
> Biowiskundedagen 2017-2018
> http://www.biowiskundedagen.ugent.be/
>
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
[[alternative HTML version deleted]]
More information about the R-devel
mailing list