[Bioc-devel] How to use RData files in Bioconductor data and software packages

Kasper Daniel Hansen k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Wed Jan 15 00:12:51 CET 2020


Tobias,

When you use the data() command on the data package, you need to do
  library(dummyData)
first (and you therefore need to Suggest: dummyData)

Here is an example from minfi/minfiData

if (require(minfiData)) {
  dat <- preprocessIllumina(RGsetEx, bg.correct=FALSE, normalize="controls")
}

Note how I use require to load the package. For clarity you could argue I
should also have
  data(RGsetEx)
but it is technically not necessary because of lazy loading.





On Thu, Jan 9, 2020 at 4:40 PM Pages, Herve <hpages using fredhutch.org> wrote:

> On 1/9/20 13:00, web working wrote:
> > Hi Herve,
> >
> > thank you for your detailed answer. I guess I have expressed myself
> > unclear. The BED files were just examples for data I store in the
> > inst/extdata folder. Based on the description for ExperimentHubData I
> > have decided to create a software and a data package (no
> > ExperimentHubData software package). In my RData files I store software
> > package objects. These objects are bigger than 5 MB. Using a helper
> > function is no option, because the object calculation takes to much
> > time. For this reason I want to load this objects for my example
> > functions. My question is if the storage of my RData files in the
> > inst/extdata directory is correct or not.
>
> It's technically correct but it's not as convenient as putting them in
> data/ because they can not longer be listed and/or loaded with data().
> So if you're storing them in inst/extdata only because the data()
> solution gave you a BiocCheck warning then I'd say that you're giving up
> too easily ;-)
>
> IMO it is important to try to understand why the data() solution gave
> you a BiocCheck warning in the first place. Unfortunately you're not
> providing enough information for us to be able to tell. What does the
> warning say? How can we reproduce the warning? Ideally we would need to
> see a transcript of your session and links to your packages.
>
> Thanks,
> H.
>
>
> >
> > Best,
> >
> > Tobias
> >
> > Am 09.01.20 um 17:59 schrieb Pages, Herve:
> >> Hi Tobias,
> >>
> >> If the original data is in BED files, there should be no need to
> >> serialize the objects obtained by importing the files. It is **much**
> >> better to provide a small helper function that creates an object from a
> >> BED file and to use that function each time you need to load an object.
> >>
> >> This has at least 2 advantages:
> >> 1. It avoids redundant storage of the data.
> >> 2. By avoiding serialization of high-level S4 objects, it makes the
> >> package easier to maintain in the long run.
> >>
> >> Note that the helper function could also implement a cache mechanism
> >> (easy to do with an environment) so the BED file is only loaded and the
> >> object created the 1st time the function is called. On subsequent calls,
> >> the object is retrieved from the cache.
> >>
> >> However, if the BED files are really big (e.g. > 50 Mb), we require them
> >> to be stored on ExperimentHub instead of inside dummyData. Note that you
> >> still need to provide the dummyData package (which becomes what we call
> >> an ExperimentHub-based data package). See the "Creating An ExperimentHub
> >> Package" vignette in the ExperimentHubData package for more information
> >> about this.
> >>
> >> Hope this helps,
> >>
> >> H.
> >>
> >> On 1/9/20 04:45, web working wrote:
> >>> Dear all,
> >>>
> >>> I am currently developing a software package (dummySoftware) and a data
> >>> package (dummyData) and I am a bit confused in where to store my RData
> >>> files in the data package. Here my situation:
> >>>
> >>> I want to store some software package objects (new class objects of the
> >>> software package) in the data package. This objects are example objects
> >>> and a to big for software packages. As I have read here
> >>> (
> https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o&e=
> >>>
> >>> ) all RData objects should be stored in the data directory of a
> package.
> >>> BED files of the data package are stored in inst/extdata.
> >>> The data of the data packaged will be addressed in the software package
> >>> like this: system.file('extdata', 'subset.bed', package = 'dummyData').
> >>> And here the problem occurs. After building the data package
> >>> (devtools::build(args = c('--resave-data'))), all data in data/ are
> >>> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
> >>> addressed with system.file. Addressing this data with the data()
> >>> function results in a warning during BiocCheck::BiocCheck().
> >>>
> >>> My solution is to store the RData files in the inst/extdata directory
> >>> and address them with system.file. Something similar is mentioned here,
> >>> but in the context of a vignette
> >>> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
> >>>
> >>> Best,
> >>> Tobias
> >>>
> >>> _______________________________________________
> >>> Bioc-devel using r-project.org mailing list
> >>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=GYaoH8LeSP0tdY4PoOHEdDMGhzLC2gHcNGtKjVLZV-8&e=
> >>>
> >>>
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qwx6C4UGtdKkDQNJBk3T8CAyjLptT_qPz1HFj8qb5l0&s=ujZIxuMXFvoAkJIgZhZZ4XURLlpCxhqdrY9Bma2xAlc&e=
> >
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages using fredhutch.org
> Phone:  (206) 667-5791
> Fax:    (206) 667-1319
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Best,
Kasper

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list