[Bioc-devel] How to use RData files in Bioconductor data and software packages

web working webwork|ng @end|ng |rom po@teo@de
Thu Jan 16 16:25:37 CET 2020


Thank you for your example Kasper. The require option seems to be an 
option for me. I am following the Bioconductor "Circular Dependencies" 
Guidelines 
(https://github.com/Bioconductor/Contributions/blob/master/CONTRIBUTING.md#submitting-related-packages) 
to implement my software and my data package and using the "Suggest" and 
"Depends" connection.

Am 15.01.20 um 00:12 schrieb Kasper Daniel Hansen:
> Tobias,
>
> When you use the data() command on the data package, you need to do
>    library(dummyData)
> first (and you therefore need to Suggest: dummyData)
>
> Here is an example from minfi/minfiData
>
> if (require(minfiData)) {
>    dat <- preprocessIllumina(RGsetEx, bg.correct=FALSE, normalize="controls")
> }
>
> Note how I use require to load the package. For clarity you could argue I
> should also have
>    data(RGsetEx)
> but it is technically not necessary because of lazy loading.
>
>
>
>
>
> On Thu, Jan 9, 2020 at 4:40 PM Pages, Herve <hpages using fredhutch.org> wrote:
>
>> On 1/9/20 13:00, web working wrote:
>>> Hi Herve,
>>>
>>> thank you for your detailed answer. I guess I have expressed myself
>>> unclear. The BED files were just examples for data I store in the
>>> inst/extdata folder. Based on the description for ExperimentHubData I
>>> have decided to create a software and a data package (no
>>> ExperimentHubData software package). In my RData files I store software
>>> package objects. These objects are bigger than 5 MB. Using a helper
>>> function is no option, because the object calculation takes to much
>>> time. For this reason I want to load this objects for my example
>>> functions. My question is if the storage of my RData files in the
>>> inst/extdata directory is correct or not.
>> It's technically correct but it's not as convenient as putting them in
>> data/ because they can not longer be listed and/or loaded with data().
>> So if you're storing them in inst/extdata only because the data()
>> solution gave you a BiocCheck warning then I'd say that you're giving up
>> too easily ;-)
>>
>> IMO it is important to try to understand why the data() solution gave
>> you a BiocCheck warning in the first place. Unfortunately you're not
>> providing enough information for us to be able to tell. What does the
>> warning say? How can we reproduce the warning? Ideally we would need to
>> see a transcript of your session and links to your packages.
>>
>> Thanks,
>> H.
>>
>>
>>> Best,
>>>
>>> Tobias
>>>
>>> Am 09.01.20 um 17:59 schrieb Pages, Herve:
>>>> Hi Tobias,
>>>>
>>>> If the original data is in BED files, there should be no need to
>>>> serialize the objects obtained by importing the files. It is **much**
>>>> better to provide a small helper function that creates an object from a
>>>> BED file and to use that function each time you need to load an object.
>>>>
>>>> This has at least 2 advantages:
>>>> 1. It avoids redundant storage of the data.
>>>> 2. By avoiding serialization of high-level S4 objects, it makes the
>>>> package easier to maintain in the long run.
>>>>
>>>> Note that the helper function could also implement a cache mechanism
>>>> (easy to do with an environment) so the BED file is only loaded and the
>>>> object created the 1st time the function is called. On subsequent calls,
>>>> the object is retrieved from the cache.
>>>>
>>>> However, if the BED files are really big (e.g. > 50 Mb), we require them
>>>> to be stored on ExperimentHub instead of inside dummyData. Note that you
>>>> still need to provide the dummyData package (which becomes what we call
>>>> an ExperimentHub-based data package). See the "Creating An ExperimentHub
>>>> Package" vignette in the ExperimentHubData package for more information
>>>> about this.
>>>>
>>>> Hope this helps,
>>>>
>>>> H.
>>>>
>>>> On 1/9/20 04:45, web working wrote:
>>>>> Dear all,
>>>>>
>>>>> I am currently developing a software package (dummySoftware) and a data
>>>>> package (dummyData) and I am a bit confused in where to store my RData
>>>>> files in the data package. Here my situation:
>>>>>
>>>>> I want to store some software package objects (new class objects of the
>>>>> software package) in the data package. This objects are example objects
>>>>> and a to big for software packages. As I have read here
>>>>> (
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__r-2Dpkgs.had.co.nz_data.html&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=0ajhWDlJfLxXxzJpreO1Nh4qnu3aJ8gQkRb9qThsi1o&e=
>>>>> ) all RData objects should be stored in the data directory of a
>> package.
>>>>> BED files of the data package are stored in inst/extdata.
>>>>> The data of the data packaged will be addressed in the software package
>>>>> like this: system.file('extdata', 'subset.bed', package = 'dummyData').
>>>>> And here the problem occurs. After building the data package
>>>>> (devtools::build(args = c('--resave-data'))), all data in data/ are
>>>>> stored in a datalist, Rdata.rdb, Rdata.rds and Rdata.rdx and can not
>>>>> addressed with system.file. Addressing this data with the data()
>>>>> function results in a warning during BiocCheck::BiocCheck().
>>>>>
>>>>> My solution is to store the RData files in the inst/extdata directory
>>>>> and address them with system.file. Something similar is mentioned here,
>>>>> but in the context of a vignette
>>>>> (r-pkgs.had.co.nz/data.html#other-data). Is this the way how to do it?
>>>>>
>>>>> Best,
>>>>> Tobias
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel using r-project.org mailing list
>>>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=GaTKqVd_WDqMRk0dW7SYkjVlgCzt0I0bACHfb1iIOVc&s=GYaoH8LeSP0tdY4PoOHEdDMGhzLC2gHcNGtKjVLZV-8&e=
>>>>>
>>> _______________________________________________
>>> Bioc-devel using r-project.org mailing list
>>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=Qwx6C4UGtdKkDQNJBk3T8CAyjLptT_qPz1HFj8qb5l0&s=ujZIxuMXFvoAkJIgZhZZ4XURLlpCxhqdrY9Bma2xAlc&e=
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages using fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>



More information about the Bioc-devel mailing list