[Bioc-devel] Methods to speed up R CMD Check
Murphy, Alan E
@@murphy @end|ng |rom |mper|@|@@c@uk
Tue Mar 23 18:53:10 CET 2021
Hey Herv�,
Thanks for this it is very helpful and I'm very sorry but I have one more question, where to put option 3? I thought making an onload r script for it:
.onLoad <- function(libname, pkgname) {
.my_internal_global_variables <- new.env(parent=emptyenv())
.get_eh <- function() get("eh", envir=.my_internal_global_variables)
.set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)
toto <- function()
{
eh <- try(.get_eh(), silent=TRUE)
if (inherits(eh, "try-error")) {
eh <- ExperimentHub()
.set_eh(eh)
}
eh
}
toto()
}
This seems to work in that the script runs (I can tell based on the output with devtools::check()) but I still get an error that eh doesn't exist in my test functions.
Kind regards,
Alan.
________________________________
From: Herv� Pag�s <hpages.on.github using gmail.com>
Sent: 23 March 2021 17:31
To: Murphy, Alan E <a.murphy using imperial.ac.uk>; Martin Morgan <mtmorgan.bioc using gmail.com>; Kern, Lori <Lori.Shepherd using RoswellPark.org>; bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: Re: [Bioc-devel] Methods to speed up R CMD Check
3 ways to do this, one that doesn't work, and two that work ;-)
1. Simple way that doesn't work:
## Just a place holder. Will be initialized at run-time the first
## time it's needed.
.some_internal_global_variable <- NULL
toto <- function()
{
if (is.null(.some_global_variable))
.some_internal_global_variable <<- 55L
}
However, if you put this in your package, you'll get the following
error the first time toto() is called:
cannot change value of locked binding for
'.some_internal_global_variable'
2. Simple way that works: initialize the global variable in the
.onLoad() hook. This works because the .onLoad() hook is executed
right before the package namespace gets locked.
## Just a place holder. Will be initialized at load-time.
.some_internal_global_variable <- NULL
.onLoad <- function(libname, pkgname)
{
.some_internal_global_variable <<- 55L
}
However, I don't really like using this approach when initialization
of the global variable requires access to the internet. It means that
in case of connectivity issue your users won't be able to load the
package and troubleshooting can become really tricky when you can't
even load a package. So in that case I prefer the solution below.
3. Define the internal global variable in an environment:
.my_internal_global_variables <- new.env(parent=emptyenv())
.get_eh <- function() get("eh", envir=.my_internal_global_variables)
.set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)
toto <- function()
{
eh <- try(.get_eh(), silent=TRUE)
if (inherits(eh, "try-error")) {
eh <- ExperimentHub()
.set_eh(eh)
}
eh
}
Hope this helps,
H.
On 3/23/21 10:05 AM, Murphy, Alan E wrote:
> Hey Herv�,
>
> I get the idea now thanks for clarifying. Where do I place the call to
> ExperimentHub in the package?:
>
> eh <- ExperimentHub() # the only call to ExperimentHub() in
> # the entire R session
>
> The package contains calls to the datasets in internal functions,
> examples, tests and the vignette so eh it would need to be available to
> all. Sorry I don't have much experience using experiment datasets.
>
> Kind regards,
> Alan.
>
> ------------------------------------------------------------------------
> *From:* Herv� Pag�s <hpages.on.github using gmail.com>
> *Sent:* 23 March 2021 16:46
> *To:* Murphy, Alan E <a.murphy using imperial.ac.uk>; Martin Morgan
> <mtmorgan.bioc using gmail.com>; Kern, Lori <Lori.Shepherd using RoswellPark.org>;
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
>
> *******************
> This email originates from outside Imperial. Do not click on links and
> attachments unless you recognise the sender.
> If you trust the sender, add them to your safe senders list
> https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx> to disable email
> stamping for this address.
> *******************
> On 3/23/21 4:11 AM, Murphy, Alan E wrote:
>> Hi,
>>
>> Thank you very much Martin and Herv� for your suggestions. I have reverted my zzz.R on load function to that advised by ExperimentHub and had used the ID look up (system.time(tt_alzh <- eh[["EH5373"]])) on internal functions and unit tests. However, the check is still taking ~18 minutes so I need to do a bit more work. Even with
> my new on load function, calling datasets by name still takes
> substantially longer, see below for the example Herv� gave on my new code:
>>
>> a<-function(){
>> eh <- query(ExperimentHub(), "ewceData")
>
> The above line is not needed. Creating an ExperimentHub instance can be
> quite expensive and with the current approach 'R CMD check' will do it
> dozens of times. My suggestion was to create an ExperimentHub instance
> once for all the first time you need it, and reuse it in all your data
> access functions:
>
> eh <- ExperimentHub() # the only call to ExperimentHub() in
> # the entire R session
>
> Also there's no need to query(). Just use the EHb ID directly on the
> ExperimentHub instance to load your data:
>
> eh[["EH5373"]]
>
> This should reduce 'R CMD check' by a few more minutes.
>
> H.
>
> --
> Herv� Pag�s
>
> Bioconductor Core Team
> hpages.on.github using gmail.com
--
Herv� Pag�s
Bioconductor Core Team
hpages.on.github using gmail.com
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list