[Bioc-devel] Windows-only issue with downloading a Rdata file and loading it with R

Martin Morgan martin.morgan at roswellpark.org
Sat Jun 18 16:41:40 CEST 2016


On 06/18/2016 12:58 AM, Leonardo Collado Torres wrote:
> Hi,
>
> I get the same error while hosting the data somewhere else or when using
> RawGit's url. That is:
>
>> library('downloader')
>> download('
> http://www.biostat.jhsph.edu/~lcollado/recount/metadata_clean_sra.Rdata',
> destfile = 'test2.Rdata')
>> load('tes2t.Rdata')
> Error: ReadItem: unknown type 50, perhaps written by later version of R
>> download('
> https://cdn.rawgit.com/leekgroup/recount-website/master/metadata/metadata_clean_sra.Rdata',
> destfile = 'test3.Rdata')
>> load('test3.Rdata')
> Error: ReadItem: unknown type 50, perhaps written by later version of R
>
> Again, it only happens on Windows but not on the other OS. So it doesn't
> look like a GitHub issue.

use mode="wb" to download in binary mode.

Martin

>
> Best,
> Leo
>
>
> On Fri, Jun 17, 2016 at 4:57 PM, Gabe Becker <becker.gabe at gene.com> wrote:
>
>> I wonder if raw only means "raw after line return munging"? can you attach
>> the file that gets downloaded  via email? (off list is fine)
>>
>> On Fri, Jun 17, 2016 at 1:44 PM, Leonardo Collado Torres <lcollado at jhu.edu
>>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to figure out what is going wrong with an error that pops
>>> up on Windows only. It's currently the only error for a package that I
>>> recently submitted to Bioc. The function is fairly simple: it
>>> downloads a Rdata file from the web and loads it.
>>>
>>> If I try to download and load the file with R, the following error
>>> occurs (only on Windows):
>>>
>>>
>>>> library('downloader')
>>>> download('
>>> https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_sra.Rdata?raw=true',
>>> destfile = 'test.Rdata')
>>> trying URL '
>>> https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_sra.Rdata?raw=true
>>> '
>>> Content type 'application/octet-stream' length 2531337 bytes (2.4 MB)
>>> downloaded 2.4 MB
>>>
>>>> load('test.Rdata')
>>> Error: ReadItem: unknown type 50, perhaps written by later version of R
>>>> traceback()
>>> 1: load("test.Rdata")
>>>> options(width = 120)
>>>> devtools::session_info()
>>> Session info
>>> -----------------------------------------------------------------------------------------------------------
>>>   setting  value
>>>   version  R version 3.3.0 (2016-05-03)
>>>   system   x86_64, mingw32
>>>   ui       Rgui
>>>   language (EN)
>>>   collate  English_United States.1252
>>>   tz       America/New_York
>>>   date     2016-06-17
>>>
>>> Packages
>>> ---------------------------------------------------------------------------------------------------------------
>>>   package    * version date       source
>>>   devtools     1.11.1  2016-04-21 CRAN (R 3.3.0)
>>>   digest       0.6.9   2016-01-08 CRAN (R 3.3.0)
>>>   downloader * 0.4     2015-07-09 CRAN (R 3.3.0)
>>>   memoise      1.0.0   2016-01-29 CRAN (R 3.3.0)
>>>   withr        1.0.1   2016-02-04 CRAN (R 3.3.0)
>>>>
>>>
>>>
>>> If I open the same url on my browser and manually download the file,
>>> then everything works as shown below:
>>>
>>>> load('metadata_clean_sra.Rdata')
>>>> metadata_clean
>>> Loading required package: S4Vectors
>>> Loading required package: stats4
>>> Loading required package: BiocGenerics
>>> Loading required package: parallel
>>> ## removed more output
>>>
>>>> options(width = 120)
>>>> devtools::session_info()
>>> Session info
>>> -----------------------------------------------------------------------------------------------------------
>>>   setting  value
>>>   version  R version 3.3.0 (2016-05-03)
>>>   system   x86_64, mingw32
>>>   ui       Rgui
>>>   language (EN)
>>>   collate  English_United States.1252
>>>   tz       America/New_York
>>>   date     2016-06-17
>>>
>>> Packages
>>> ---------------------------------------------------------------------------------------------------------------
>>>   package      * version date       source
>>>   BiocGenerics * 0.19.1  2016-06-17 Bioconductor
>>>   devtools       1.11.1  2016-04-21 CRAN (R 3.3.0)
>>>   digest         0.6.9   2016-01-08 CRAN (R 3.3.0)
>>>   IRanges      * 2.7.2   2016-06-07 Bioconductor
>>>   memoise        1.0.0   2016-01-29 CRAN (R 3.3.0)
>>>   S4Vectors    * 0.11.3  2016-06-03 Bioconductor
>>>   withr          1.0.1   2016-02-04 CRAN (R 3.3.0)
>>>> print(object.size(metadata_clean), units = 'Mb')
>>> 30.5 Mb
>>>
>>> The object itself is a DataFrame and was created using R 3.3.1 with
>>> S4Vectors version 0.11.4. I get the same error if using a Unix machine
>>> I re-save the data using R 3.3.0 (with S4Vectors from Bioc-release).
>>>
>>> Some google leads are "corrupt file" or something about a hidden
>>> session Rdata file. But from the manual test, everything looks line.
>>> Unless downloader::download() (or alternatively utils::download.file()
>>> ) is corrupting the file.
>>>
>>>
>>> An option would be to include the data in the package, but I'd like to
>>> avoid doing so to minimize the package size. It already has a big
>>> data.frame that is necessary for the package to work. This short
>>> function is there for convenience.
>>>
>>> Best,
>>> Leo
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>>
>>
>>
>> --
>> Gabriel Becker, Ph.D
>> Associate Scientist
>> Bioinformatics and Computational Biology
>> Genentech Research
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


This email message may contain legally privileged and/or...{{dropped:2}}



More information about the Bioc-devel mailing list