[Bioc-devel] How to use RData files in Bioconductor data and software packages

web working webwork|ng @end|ng |rom po@teo@de
Thu Jan 9 22:00:52 CET 2020

Hi Herve,

thank you for your detailed answer. I guess I have expressed myself 
unclear. The BED files were just examples for data I store in the 
inst/extdata folder. Based on the description for ExperimentHubData I 
have decided to create a software and a data package (no 
ExperimentHubData software package). In my RData files I store software 
package objects. These objects are bigger than 5 MB. Using a helper 
function is no option, because the object calculation takes to much 
time. For this reason I want to load this objects for my example 
functions. My question is if the storage of my RData files in the 
inst/extdata directory is correct or not.



Am 09.01.20 um 17:59 schrieb Pages, Herve:
> Hi Tobias,
> If the original data is in BED files, there should be no need to
> serialize the objects obtained by importing the files. It is **much**
> better to provide a small helper function that creates an object from a
> BED file and to use that function each time you need to load an object.
> This has at least 2 advantages:
> 1. It avoids redundant storage of the data.
> 2. By avoiding serialization of high-level S4 objects, it makes the
> package easier to maintain in the long run.
> Note that the helper function could also implement a cache mechanism
> (easy to do with an environment) so the BED file is only loaded and the
> object created the 1st time the function is called. On subsequent calls,
> the object is retrieved from the cache.
> However, if the BED files are really big (e.g. > 50 Mb), we require them
> to be stored on ExperimentHub instead of inside dummyData. Note that you
> still need to provide the dummyData package (which becomes what we call
> an ExperimentHub-based data package). See the "Creating An ExperimentHub
> Package" vignette in the ExperimentHubData package for more information
> about this.
> Hope this helps,
> H.
> On 1/9/20 04:45, web working wrote:
>> Dear all,
>> I am currently developing a software package (dummySoftware) and a data
>> package (dummyData) and I am a bit confused in where to store my RData
>> files in the data package. Here my situation:
>> I want to store some software package objects (new class objects of the
>> software package) in the data package. This objects are example objects
>> and a to big for software packages. As I have read here
>> (https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o&e=
>> ) all RData objects should be stored in the data directory of a package.
>> BED files of the data package are stored in inst/extdata.
>> The data of the data packaged will be addressed in the software package
>> like this: system.file('extdata', 'subset.bed', package = 'dummyData').
>> And here the problem occurs. After building the data package
>> (devtools::build(args = c('--resave-data'))), all data in data/ are
>> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
>> addressed with system.file. Addressing this data with the data()
>> function results in a warning during BiocCheck::BiocCheck().
>> My solution is to store the RData files in the inst/extdata directory
>> and address them with system.file. Something similar is mentioned here,
>> but in the context of a vignette
>> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
>> Best,
>> Tobias
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=GYaoH8LeSP0tdY4PoOHEdDMGhzLC2gHcNGtKjVLZV-8&e=

More information about the Bioc-devel mailing list