[Rd] download.file does not process gz files correctly (truncates them?)
Joris Meys
jori@mey@ @ending from gm@il@com
Wed May 2 21:21:47 CEST 2018
Dear all,
I've noticed by trying to download gz files from here :
https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM907811
At the bottom one can download GSM907811.CEL.gz . If I download this
manually and try
oligo::read.celfiles("GSM907811.CEL.gz")
everything works fine. (oligo is a bioConductor package)
However, if I download using
download.file("
https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM907811&format=file&file=GSM907811%2ECEL%2Egz
",
destfile = "GSM907811.CEL.gz")
The file is downloaded, but oligo::read.celfiles() returns the following
error:
Error in checkChipTypes(filenames, verbose, "affymetrix", TRUE) :
End of gz file reached unexpectedly. Perhaps this file is truncated.
Moreover, if I try to delete it after using download.file(), I get a
warning that permission is denied. I can only remove it using Windows file
explorer after I closed the R session, indicating that the connection is
still open. Yet, showConnections() doesn't show any open connections either.
Session info below. Note that I started from a completely fresh R session.
oligo is needed due to the specific file format of these gz files. They're
not standard tarred files.
Cheers
Joris
Session Info
-------------------------------------------------------------------------------------
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252
LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
methods
[9] base
other attached packages:
[1] pd.hugene.1.0.st.v1_3.14.1 DBI_0.8
oligo_1.44.0
[4] Biobase_2.39.2 oligoClasses_1.42.0
RSQLite_2.1.0
[7] Biostrings_2.48.0 XVector_0.19.9
IRanges_2.13.28
[10] S4Vectors_0.17.42 BiocGenerics_0.25.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.16 compiler_3.5.0
[3] BiocInstaller_1.30.0 GenomeInfoDb_1.15.5
[5] bitops_1.0-6 iterators_1.0.9
[7] tools_3.5.0 zlibbioc_1.25.0
[9] digest_0.6.15 bit_1.1-12
[11] memoise_1.1.0 preprocessCore_1.41.0
[13] lattice_0.20-35 ff_2.2-13
[15] pkgconfig_2.0.1 Matrix_1.2-14
[17] foreach_1.4.4 DelayedArray_0.5.31
[19] yaml_2.1.18 GenomeInfoDbData_1.1.0
[21] affxparser_1.52.0 bit64_0.9-7
[23] grid_3.5.0 BiocParallel_1.13.3
[25] blob_1.1.1 codetools_0.2-15
[27] matrixStats_0.53.1 GenomicRanges_1.31.23
[29] splines_3.5.0 SummarizedExperiment_1.9.17
[31] RCurl_1.95-4.10 affyio_1.49.2
--
Joris Meys
Statistical consultant
Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
[[alternative HTML version deleted]]
More information about the R-devel
mailing list