[Bioc-devel] Methods to speed up R CMD Check
Hervé Pagès
hp@ge@@on@g|thub @end|ng |rom gm@||@com
Tue Mar 23 20:08:12 CET 2021
Doesn't really matter where you put them but
.my_internal_global_variables, .get_eh(), .set_eh(), and toto() don't
need to be defined inside the .onLoad() hook, and it's actually
cleaner/easier to not define them there. You can define there anywhere
in your ewceData/R/* files.
Here is a slightly improved version of the code:
.my_internal_global_variables <- new.env(parent=emptyenv())
.get_eh <- function() get("eh", envir=.my_internal_global_variables)
.set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)
get_ExperimentHub <- function()
{
eh <- try(.get_eh(), silent=TRUE)
if (inherits(eh, "try-error")) {
eh <- ExperimentHub::ExperimentHub()
.set_eh(eh)
}
eh
}
In my packages I like to put this kind of low-level stuff in R/utils.R.
None of this is exported.
Then anywhere in your package when you need an ExperimentHub instance, do:
## Exported.
tt_alzh <- function()
{
eh <- get_ExperimentHub()
eh[["EH5373"]]
}
H.
On 3/23/21 10:53 AM, Murphy, Alan E wrote:
> HeyHervé,
>
> Thanks for this it is very helpful and I'm very sorry but I have one
> more question, where to put option 3? I thought making an onload r
> script for it:
>
> .onLoad <- function(libname, pkgname) {
> .my_internal_global_variables <- new.env(parent=emptyenv())
> .get_eh <- function() get("eh", envir=.my_internal_global_variables)
> .set_eh <- function(value) assign("eh", value,
> envir=.my_internal_global_variables)
> toto <- function()
> {
> eh <- try(.get_eh(), silent=TRUE)
> if (inherits(eh, "try-error")) {
> eh <- ExperimentHub()
> .set_eh(eh)
> }
> eh
> }
> toto()
> }
>
> This seems to work in that the script runs (I can tell based on the
> output with devtools::check()) but I still get an error that eh doesn't
> exist in my test functions.
>
> Kind regards,
> Alan.
> ------------------------------------------------------------------------
> *From:* Hervé Pagès <hpages.on.github using gmail.com>
> *Sent:* 23 March 2021 17:31
> *To:* Murphy, Alan E <a.murphy using imperial.ac.uk>; Martin Morgan
> <mtmorgan.bioc using gmail.com>; Kern, Lori <Lori.Shepherd using RoswellPark.org>;
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
> 3 ways to do this, one that doesn't work, and two that work ;-)
>
>
> 1. Simple way that doesn't work:
>
> ## Just a place holder. Will be initialized at run-time the first
> ## time it's needed.
> .some_internal_global_variable <- NULL
>
> toto <- function()
> {
> if (is.null(.some_global_variable))
> .some_internal_global_variable <<- 55L
> }
>
> However, if you put this in your package, you'll get the following
> error the first time toto() is called:
>
> cannot change value of locked binding for
> '.some_internal_global_variable'
>
> 2. Simple way that works: initialize the global variable in the
> .onLoad() hook. This works because the .onLoad() hook is executed
> right before the package namespace gets locked.
>
> ## Just a place holder. Will be initialized at load-time.
> .some_internal_global_variable <- NULL
>
> .onLoad <- function(libname, pkgname)
> {
> .some_internal_global_variable <<- 55L
> }
>
> However, I don't really like using this approach when initialization
> of the global variable requires access to the internet. It means that
> in case of connectivity issue your users won't be able to load the
> package and troubleshooting can become really tricky when you can't
> even load a package. So in that case I prefer the solution below.
>
> 3. Define the internal global variable in an environment:
>
> .my_internal_global_variables <- new.env(parent=emptyenv())
>
> .get_eh <- function() get("eh", envir=.my_internal_global_variables)
> .set_eh <- function(value) assign("eh", value,
> envir=.my_internal_global_variables)
>
> toto <- function()
> {
> eh <- try(.get_eh(), silent=TRUE)
> if (inherits(eh, "try-error")) {
> eh <- ExperimentHub()
> .set_eh(eh)
> }
> eh
> }
>
> Hope this helps,
>
> H.
>
>
>
>
> On 3/23/21 10:05 AM, Murphy, Alan E wrote:
>> Hey Hervé,
>>
>> I get the idea now thanks for clarifying. Where do I place the call to
>> ExperimentHub in the package?:
>>
>> eh <- ExperimentHub() # the only call to ExperimentHub() in
>> # the entire R session
>>
>> The package contains calls to the datasets in internal functions,
>> examples, tests and the vignette so eh it would need to be available to
>> all. Sorry I don't have much experience using experiment datasets.
>>
>> Kind regards,
>> Alan.
>>
>> ------------------------------------------------------------------------
>> *From:* Hervé Pagès <hpages.on.github using gmail.com>
>> *Sent:* 23 March 2021 16:46
>> *To:* Murphy, Alan E <a.murphy using imperial.ac.uk>; Martin Morgan
>> <mtmorgan.bioc using gmail.com>; Kern, Lori <Lori.Shepherd using RoswellPark.org>;
>> bioc-devel using r-project.org <bioc-devel using r-project.org>
>> *Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
>>
>> *******************
>> This email originates from outside Imperial. Do not click on links and
>> attachments unless you recognise the sender.
>> If you trust the sender, add them to your safe senders list
>> https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx>
>> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx>> to disable email
>> stamping for this address.
>> *******************
>> On 3/23/21 4:11 AM, Murphy, Alan E wrote:
>>> Hi,
>>>
>>> Thank you very much Martin and Hervé for your suggestions. I have reverted my zzz.R on load function to that advised by ExperimentHub and had used the ID look up (system.time(tt_alzh <- eh[["EH5373"]])) on internal functions and unit tests. However, the check is still taking ~18 minutes so I need to do a bit more work.
> Even with
>> my new on load function, calling datasets by name still takes
>> substantially longer, see below for the example Hervé gave on my new code:
>>>
>>> a<-function(){
>>> eh <- query(ExperimentHub(), "ewceData")
>>
>> The above line is not needed. Creating an ExperimentHub instance can be
>> quite expensive and with the current approach 'R CMD check' will do it
>> dozens of times. My suggestion was to create an ExperimentHub instance
>> once for all the first time you need it, and reuse it in all your data
>> access functions:
>>
>> eh <- ExperimentHub() # the only call to ExperimentHub() in
>> # the entire R session
>>
>> Also there's no need to query(). Just use the EHb ID directly on the
>> ExperimentHub instance to load your data:
>>
>> eh[["EH5373"]]
>>
>> This should reduce 'R CMD check' by a few more minutes.
>>
>> H.
>>
>> --
>> Hervé Pagès
>>
>> Bioconductor Core Team
>> hpages.on.github using gmail.com
>
> --
> Hervé Pagès
>
> Bioconductor Core Team
> hpages.on.github using gmail.com
--
Hervé Pagès
Bioconductor Core Team
hpages.on.github using gmail.com
More information about the Bioc-devel
mailing list