[Bioc-devel] Windows-only issue with downloading a Rdata file and loading it with R

Leonardo Collado Torres lcollado at jhu.edu
Sun Jun 19 16:38:28 CEST 2016


Thanks Martin! Using mode='wb' solved the issue.

On Sat, Jun 18, 2016 at 10:41 AM, Martin Morgan
<martin.morgan at roswellpark.org> wrote:
> On 06/18/2016 12:58 AM, Leonardo Collado Torres wrote:
>>
>> Hi,
>>
>> I get the same error while hosting the data somewhere else or when using
>> RawGit's url. That is:
>>
>>> library('downloader')
>>> download('
>>
>> http://www.biostat.jhsph.edu/~lcollado/recount/metadata_clean_sra.Rdata',
>> destfile = 'test2.Rdata')
>>>
>>> load('tes2t.Rdata')
>>
>> Error: ReadItem: unknown type 50, perhaps written by later version of R
>>>
>>> download('
>>
>>
>> https://cdn.rawgit.com/leekgroup/recount-website/master/metadata/metadata_clean_sra.Rdata',
>> destfile = 'test3.Rdata')
>>>
>>> load('test3.Rdata')
>>
>> Error: ReadItem: unknown type 50, perhaps written by later version of R
>>
>> Again, it only happens on Windows but not on the other OS. So it doesn't
>> look like a GitHub issue.
>
>
> use mode="wb" to download in binary mode.
>
> Martin
>
>>
>> Best,
>> Leo
>>
>>
>> On Fri, Jun 17, 2016 at 4:57 PM, Gabe Becker <becker.gabe at gene.com> wrote:
>>
>>> I wonder if raw only means "raw after line return munging"? can you
>>> attach
>>> the file that gets downloaded  via email? (off list is fine)
>>>
>>> On Fri, Jun 17, 2016 at 1:44 PM, Leonardo Collado Torres
>>> <lcollado at jhu.edu
>>>>
>>>> wrote:
>>>
>>>
>>>> Hi,
>>>>
>>>> I'm trying to figure out what is going wrong with an error that pops
>>>> up on Windows only. It's currently the only error for a package that I
>>>> recently submitted to Bioc. The function is fairly simple: it
>>>> downloads a Rdata file from the web and loads it.
>>>>
>>>> If I try to download and load the file with R, the following error
>>>> occurs (only on Windows):
>>>>
>>>>
>>>>> library('downloader')
>>>>> download('
>>>>
>>>>
>>>> https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_sra.Rdata?raw=true',
>>>> destfile = 'test.Rdata')
>>>> trying URL '
>>>>
>>>> https://github.com/leekgroup/recount-website/blob/master/metadata/metadata_clean_sra.Rdata?raw=true
>>>> '
>>>> Content type 'application/octet-stream' length 2531337 bytes (2.4 MB)
>>>> downloaded 2.4 MB
>>>>
>>>>> load('test.Rdata')
>>>>
>>>> Error: ReadItem: unknown type 50, perhaps written by later version of R
>>>>>
>>>>> traceback()
>>>>
>>>> 1: load("test.Rdata")
>>>>>
>>>>> options(width = 120)
>>>>> devtools::session_info()
>>>>
>>>> Session info
>>>>
>>>> -----------------------------------------------------------------------------------------------------------
>>>>   setting  value
>>>>   version  R version 3.3.0 (2016-05-03)
>>>>   system   x86_64, mingw32
>>>>   ui       Rgui
>>>>   language (EN)
>>>>   collate  English_United States.1252
>>>>   tz       America/New_York
>>>>   date     2016-06-17
>>>>
>>>> Packages
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------
>>>>   package    * version date       source
>>>>   devtools     1.11.1  2016-04-21 CRAN (R 3.3.0)
>>>>   digest       0.6.9   2016-01-08 CRAN (R 3.3.0)
>>>>   downloader * 0.4     2015-07-09 CRAN (R 3.3.0)
>>>>   memoise      1.0.0   2016-01-29 CRAN (R 3.3.0)
>>>>   withr        1.0.1   2016-02-04 CRAN (R 3.3.0)
>>>>>
>>>>>
>>>>
>>>>
>>>> If I open the same url on my browser and manually download the file,
>>>> then everything works as shown below:
>>>>
>>>>> load('metadata_clean_sra.Rdata')
>>>>> metadata_clean
>>>>
>>>> Loading required package: S4Vectors
>>>> Loading required package: stats4
>>>> Loading required package: BiocGenerics
>>>> Loading required package: parallel
>>>> ## removed more output
>>>>
>>>>> options(width = 120)
>>>>> devtools::session_info()
>>>>
>>>> Session info
>>>>
>>>> -----------------------------------------------------------------------------------------------------------
>>>>   setting  value
>>>>   version  R version 3.3.0 (2016-05-03)
>>>>   system   x86_64, mingw32
>>>>   ui       Rgui
>>>>   language (EN)
>>>>   collate  English_United States.1252
>>>>   tz       America/New_York
>>>>   date     2016-06-17
>>>>
>>>> Packages
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------
>>>>   package      * version date       source
>>>>   BiocGenerics * 0.19.1  2016-06-17 Bioconductor
>>>>   devtools       1.11.1  2016-04-21 CRAN (R 3.3.0)
>>>>   digest         0.6.9   2016-01-08 CRAN (R 3.3.0)
>>>>   IRanges      * 2.7.2   2016-06-07 Bioconductor
>>>>   memoise        1.0.0   2016-01-29 CRAN (R 3.3.0)
>>>>   S4Vectors    * 0.11.3  2016-06-03 Bioconductor
>>>>   withr          1.0.1   2016-02-04 CRAN (R 3.3.0)
>>>>>
>>>>> print(object.size(metadata_clean), units = 'Mb')
>>>>
>>>> 30.5 Mb
>>>>
>>>> The object itself is a DataFrame and was created using R 3.3.1 with
>>>> S4Vectors version 0.11.4. I get the same error if using a Unix machine
>>>> I re-save the data using R 3.3.0 (with S4Vectors from Bioc-release).
>>>>
>>>> Some google leads are "corrupt file" or something about a hidden
>>>> session Rdata file. But from the manual test, everything looks line.
>>>> Unless downloader::download() (or alternatively utils::download.file()
>>>> ) is corrupting the file.
>>>>
>>>>
>>>> An option would be to include the data in the package, but I'd like to
>>>> avoid doing so to minimize the package size. It already has a big
>>>> data.frame that is necessary for the package to work. This short
>>>> function is there for convenience.
>>>>
>>>> Best,
>>>> Leo
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>>>
>>>
>>>
>>> --
>>> Gabriel Becker, Ph.D
>>> Associate Scientist
>>> Bioinformatics and Computational Biology
>>> Genentech Research
>>>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.



More information about the Bioc-devel mailing list