[Bioc-devel] Methods to speed up R CMD Check

Murphy, Alan E @@murphy @end|ng |rom |mper|@|@@c@uk
Tue Mar 23 21:31:53 CET 2021


Hi Herv�,

My apologies but I don't think I follow your approach. I have put your code below into a utils.R script in ewceData:

   .my_internal_global_variables <- new.env(parent=emptyenv())

   .get_eh <- function() get("eh", envir=.my_internal_global_variables)
   .set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)

   get_ExperimentHub <- function()
   {
     eh <- try(.get_eh(), silent=TRUE)
     if (inherits(eh, "try-error")) {
       eh <- ExperimentHub::ExperimentHub()
       .set_eh(eh)
     }
     eh
   }

Then in EWCE, I have made a script for each dataset containing the function you outlined. Note I had to use ewceData:::get_ExperimentHub to access the function, not sure if this is wrong by me up to this point:

## Exported.
tt_alzh <- function()
{
  eh <- ewceData:::get_ExperimentHub()
  eh[["EH5373"]]
}

Lastly to call a dataset I simply use tt_alzh(). However, now when I run a test script I believe the eh statement is being repeatedly called still as I get multiple warnings and the runtime doesn't improve any.

Sorry again if I have misinterpreted your approach.

Kind regards,
Alan.
________________________________
From: Herv� Pag�s <hpages.on.github using gmail.com>
Sent: 23 March 2021 19:08
To: Murphy, Alan E <a.murphy using imperial.ac.uk>; Martin Morgan <mtmorgan.bioc using gmail.com>; Kern, Lori <Lori.Shepherd using RoswellPark.org>; bioc-devel using r-project.org <bioc-devel using r-project.org>
Subject: Re: [Bioc-devel] Methods to speed up R CMD Check

Doesn't really matter where you put them but
.my_internal_global_variables, .get_eh(), .set_eh(), and toto() don't
need to be defined inside the .onLoad() hook, and it's actually
cleaner/easier to not define them there. You can define there anywhere
in your ewceData/R/* files.

Here is a slightly improved version of the code:

   .my_internal_global_variables <- new.env(parent=emptyenv())

   .get_eh <- function() get("eh", envir=.my_internal_global_variables)
   .set_eh <- function(value) assign("eh", value,
envir=.my_internal_global_variables)

   get_ExperimentHub <- function()
   {
     eh <- try(.get_eh(), silent=TRUE)
     if (inherits(eh, "try-error")) {
       eh <- ExperimentHub::ExperimentHub()
       .set_eh(eh)
     }
     eh
   }

In my packages I like to put this kind of low-level stuff in R/utils.R.
None of this is exported.

Then anywhere in your package when you need an ExperimentHub instance, do:

   ## Exported.
   tt_alzh <- function()
   {
     eh <- get_ExperimentHub()
     eh[["EH5373"]]
   }

H.


On 3/23/21 10:53 AM, Murphy, Alan E wrote:
> HeyHerv�,
>
> Thanks for this it is very helpful and I'm very sorry but I have one
> more question, where to put option 3? I thought making an onload r
> script for it:
>
> .onLoad <- function(libname, pkgname) {
>    .my_internal_global_variables <- new.env(parent=emptyenv())
>    .get_eh <- function() get("eh", envir=.my_internal_global_variables)
>    .set_eh <- function(value) assign("eh", value,
>                                      envir=.my_internal_global_variables)
>    toto <- function()
>    {
>      eh <- try(.get_eh(), silent=TRUE)
>      if (inherits(eh, "try-error")) {
>        eh <- ExperimentHub()
>        .set_eh(eh)
>      }
>      eh
>    }
>    toto()
> }
>
> This seems to work in that the script runs (I can tell based on the
> output with devtools::check()) but I still get an error that eh doesn't
> exist in my test functions.
>
> Kind regards,
> Alan.
> ------------------------------------------------------------------------
> *From:* Herv� Pag�s <hpages.on.github using gmail.com>
> *Sent:* 23 March 2021 17:31
> *To:* Murphy, Alan E <a.murphy using imperial.ac.uk>; Martin Morgan
> <mtmorgan.bioc using gmail.com>; Kern, Lori <Lori.Shepherd using RoswellPark.org>;
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
> 3 ways to do this, one that doesn't work, and two that work ;-)
>
>
> 1. Simple way that doesn't work:
>
>     ## Just a place holder. Will be initialized at run-time the first
>     ## time it's needed.
>     .some_internal_global_variable <- NULL
>
>     toto <- function()
>     {
>       if (is.null(.some_global_variable))
>           .some_internal_global_variable <<- 55L
>     }
>
>     However, if you put this in your package, you'll get the following
>     error the first time toto() is called:
>
>       cannot change value of locked binding for
> '.some_internal_global_variable'
>
> 2. Simple way that works: initialize the global variable in the
>      .onLoad() hook. This works because the .onLoad() hook is executed
>      right before the package namespace gets locked.
>
>     ## Just a place holder. Will be initialized at load-time.
>     .some_internal_global_variable <- NULL
>
>     .onLoad <- function(libname, pkgname)
>     {
>       .some_internal_global_variable <<- 55L
>     }
>
>     However, I don't really like using this approach when initialization
>     of the global variable requires access to the internet. It means that
>     in case of connectivity issue your users won't be able to load the
>     package and troubleshooting can become really tricky when you can't
>     even load a package. So in that case I prefer the solution below.
>
> 3. Define the internal global variable in an environment:
>
>     .my_internal_global_variables <- new.env(parent=emptyenv())
>
>     .get_eh <- function() get("eh", envir=.my_internal_global_variables)
>     .set_eh <- function(value) assign("eh", value,
> envir=.my_internal_global_variables)
>
>     toto <- function()
>     {
>       eh <- try(.get_eh(), silent=TRUE)
>       if (inherits(eh, "try-error")) {
>         eh <- ExperimentHub()
>         .set_eh(eh)
>       }
>       eh
>     }
>
> Hope this helps,
>
> H.
>
>
>
>
> On 3/23/21 10:05 AM, Murphy, Alan E wrote:
>> Hey Herv�,
>>
>> I get the idea now thanks for clarifying. Where do I place the call to
>> ExperimentHub in the package?:
>>
>> eh <- ExperimentHub()  # the only call to ExperimentHub() in
>>                            # the entire R session
>>
>> The package contains calls to the datasets in internal functions,
>> examples, tests and the vignette so eh it would need to be available to
>> all. Sorry I don't have much experience using experiment datasets.
>>
>> Kind regards,
>> Alan.
>>
>> ------------------------------------------------------------------------
>> *From:* Herv� Pag�s <hpages.on.github using gmail.com>
>> *Sent:* 23 March 2021 16:46
>> *To:* Murphy, Alan E <a.murphy using imperial.ac.uk>; Martin Morgan
>> <mtmorgan.bioc using gmail.com>; Kern, Lori <Lori.Shepherd using RoswellPark.org>;
>> bioc-devel using r-project.org <bioc-devel using r-project.org>
>> *Subject:* Re: [Bioc-devel] Methods to speed up R CMD Check
>>
>> *******************
>> This email originates from outside Imperial. Do not click on links and
>> attachments unless you recognise the sender.
>> If you trust the sender, add them to your safe senders list
>> https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx>
>> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx
> <https://spam.ic.ac.uk/SpamConsole/Senders.aspx>> to disable email
>> stamping for this address.
>> *******************
>> On 3/23/21 4:11 AM, Murphy, Alan E wrote:
>>> Hi,
>>>
>>> Thank you very much Martin and Herv� for your suggestions. I have reverted my zzz.R on load function to that advised by ExperimentHub and had used the ID look up (system.time(tt_alzh <- eh[["EH5373"]])) on internal functions and unit tests. However, the  check  is still taking ~18 minutes so I need to do a bit more work.
> Even with
>> my new on load function, calling datasets by name still takes
>> substantially longer, see below for the example Herv� gave on my new code:
>>>
>>> a<-function(){
>>>    eh <- query(ExperimentHub(), "ewceData")
>>
>> The above line is not needed. Creating an ExperimentHub instance can be
>> quite expensive and with the current approach 'R CMD check' will do it
>> dozens of times. My suggestion was to create an ExperimentHub instance
>> once for all the first time you need it, and reuse it in all your data
>> access functions:
>>
>>     eh <- ExperimentHub()  # the only call to ExperimentHub() in
>>                            # the entire R session
>>
>> Also there's no need to query(). Just use the EHb ID directly on the
>> ExperimentHub instance to load your data:
>>
>>     eh[["EH5373"]]
>>
>> This should reduce 'R CMD check' by a few more minutes.
>>
>> H.
>>
>> --
>> Herv� Pag�s
>>
>> Bioconductor Core Team
>> hpages.on.github using gmail.com
>
> --
> Herv� Pag�s
>
> Bioconductor Core Team
> hpages.on.github using gmail.com

--
Herv� Pag�s

Bioconductor Core Team
hpages.on.github using gmail.com

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list