[Bioc-devel] How to use RData files in Bioconductor data and software packages

Pages, Herve hp@ge@ @end|ng |rom |redhutch@org
Thu Jan 9 17:59:15 CET 2020


Hi Tobias,

If the original data is in BED files, there should be no need to 
serialize the objects obtained by importing the files. It is **much** 
better to provide a small helper function that creates an object from a 
BED file and to use that function each time you need to load an object.

This has at least 2 advantages:
1. It avoids redundant storage of the data.
2. By avoiding serialization of high-level S4 objects, it makes the 
package easier to maintain in the long run.

Note that the helper function could also implement a cache mechanism 
(easy to do with an environment) so the BED file is only loaded and the 
object created the 1st time the function is called. On subsequent calls, 
the object is retrieved from the cache.

However, if the BED files are really big (e.g. > 50 Mb), we require them 
to be stored on ExperimentHub instead of inside dummyData. Note that you 
still need to provide the dummyData package (which becomes what we call 
an ExperimentHub-based data package). See the "Creating An ExperimentHub 
Package" vignette in the ExperimentHubData package for more information 
about this.

Hope this helps,

H.

On 1/9/20 04:45, web working wrote:
> Dear all,
> 
> I am currently developing a software package (dummySoftware) and a data 
> package (dummyData) and I am a bit confused in where to store my RData 
> files in the data package. Here my situation:
> 
> I want to store some software package objects (new class objects of the 
> software package) in the data package. This objects are example objects 
> and a to big for software packages. As I have read here 
> (https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o&e= 
> ) all RData objects should be stored in the data directory of a package. 
> BED files of the data package are stored in inst/extdata.
> The data of the data packaged will be addressed in the software package 
> like this: system.file('extdata', 'subset.bed', package = 'dummyData'). 
> And here the problem occurs. After building the data package 
> (devtools::build(args = c('--resave-data'))), all data in data/ are 
> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not 
> addressed with system.file. Addressing this data with the data() 
> function results in a warning during BiocCheck::BiocCheck().
> 
> My solution is to store the RData files in the inst/extdata directory 
> and address them with system.file. Something similar is mentioned here, 
> but in the context of a vignette 
> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
> 
> Best,
> Tobias
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=GYaoH8LeSP0tdY4PoOHEdDMGhzLC2gHcNGtKjVLZV-8&e= 
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages using fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319


More information about the Bioc-devel mailing list