[Bioc-devel] Windows-only issue with downloading a Rdata file and loading it with R

Leonardo Collado Torres lcollado at jhu.edu
Fri Jun 17 22:44:43 CEST 2016


Hi,

I'm trying to figure out what is going wrong with an error that pops
up on Windows only. It's currently the only error for a package that I
recently submitted to Bioc. The function is fairly simple: it
downloads a Rdata file from the web and loads it.

If I try to download and load the file with R, the following error
occurs (only on Windows):


> library('downloader')
> download('https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_sra.Rdata?raw=true', destfile = 'test.Rdata')
trying URL 'https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_sra.Rdata?raw=true'
Content type 'application/octet-stream' length 2531337 bytes (2.4 MB)
downloaded 2.4 MB

> load('test.Rdata')
Error: ReadItem: unknown type 50, perhaps written by later version of R
> traceback()
1: load("test.Rdata")
> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, mingw32
 ui       Rgui
 language (EN)
 collate  English_United States.1252
 tz       America/New_York
 date     2016-06-17

Packages ---------------------------------------------------------------------------------------------------------------
 package    * version date       source
 devtools     1.11.1  2016-04-21 CRAN (R 3.3.0)
 digest       0.6.9   2016-01-08 CRAN (R 3.3.0)
 downloader * 0.4     2015-07-09 CRAN (R 3.3.0)
 memoise      1.0.0   2016-01-29 CRAN (R 3.3.0)
 withr        1.0.1   2016-02-04 CRAN (R 3.3.0)
>


If I open the same url on my browser and manually download the file,
then everything works as shown below:

> load('metadata_clean_sra.Rdata')
> metadata_clean
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
## removed more output

> options(width = 120)
> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, mingw32
 ui       Rgui
 language (EN)
 collate  English_United States.1252
 tz       America/New_York
 date     2016-06-17

Packages ---------------------------------------------------------------------------------------------------------------
 package      * version date       source
 BiocGenerics * 0.19.1  2016-06-17 Bioconductor
 devtools       1.11.1  2016-04-21 CRAN (R 3.3.0)
 digest         0.6.9   2016-01-08 CRAN (R 3.3.0)
 IRanges      * 2.7.2   2016-06-07 Bioconductor
 memoise        1.0.0   2016-01-29 CRAN (R 3.3.0)
 S4Vectors    * 0.11.3  2016-06-03 Bioconductor
 withr          1.0.1   2016-02-04 CRAN (R 3.3.0)
> print(object.size(metadata_clean), units = 'Mb')
30.5 Mb

The object itself is a DataFrame and was created using R 3.3.1 with
S4Vectors version 0.11.4. I get the same error if using a Unix machine
I re-save the data using R 3.3.0 (with S4Vectors from Bioc-release).

Some google leads are "corrupt file" or something about a hidden
session Rdata file. But from the manual test, everything looks line.
Unless downloader::download() (or alternatively utils::download.file()
) is corrupting the file.


An option would be to include the data in the package, but I'd like to
avoid doing so to minimize the package size. It already has a big
data.frame that is necessary for the package to work. This short
function is there for convenience.

Best,
Leo



More information about the Bioc-devel mailing list