[Bioc-devel] Methods to speed up R CMD Check

Hervé Pagès hp@ge@@on@g|thub @end|ng |rom gm@||@com
Tue Mar 23 17:54:30 CET 2021


On 3/23/21 6:33 AM, Mike Smith wrote:
...
> Finally I'll point out there's a testthat::skip_on_bioc() function that
> will allow you to skip a test on the Bioc builder, but still run that test
> locally/on GitHub etc.

What?!

 > testthat::skip_on_bioc
function ()
{
     if (identical(Sys.getenv("BBS_HOME"), "")) {
         return(invisible(TRUE))
     }
     skip("On Bioconductor")
}
<bytecode: 0x564820802278>
<environment: namespace:testthat>

No way! I need to rename that BBS_HOME env variable ;-)

> However, I think we'd all agree it'd be better to
> get all the tests running universally, rather than take that route.

You bet.

Or move the long tests to the longtests/ folder and subscribe to the 
Long Tests builds:

   https://bioconductor.org/developers/how-to/long-tests/

You'll be only able to do so once your package is accepted though so it 
doesn't really help in the context of the package review.

H.

> 
> Mike
> 
> On Tue, 23 Mar 2021 at 12:11, Murphy, Alan E <a.murphy using imperial.ac.uk>
> wrote:
> 
>> Hi,
>>
>> Thank you very much Martin and Hervé for your suggestions. I have reverted
>> my zzz.R on load function to that advised by ExperimentHub and had used the
>> ID look up (system.time(tt_alzh <- eh[["EH5373"]])) on internal functions
>> and unit tests. However, the check is still taking ~18 minutes so I need to
>> do a bit more work. Even with my new on load function, calling datasets by
>> name still takes substantially longer, see below for the example Hervé gave
>> on my new code:
>>
>> a<-function(){
>>    eh <- query(ExperimentHub(), "ewceData")
>>    tt_alzh <- eh[["EH5373"]]
>> }
>> microbenchmark::microbenchmark(a,
>>                                 tt_alzh <- ewceData::tt_alzh(),
>>                                 times=20L,unit="s")
>>> Unit: seconds
>>> expr                                         min          lq
>>   mean      median          uq         max neval
>>> a                                              0.00000003 0.000000031
>> 0.0000002995 0.000000045 0.000000684 0.000001064    20
>> t>t_alzh <- ewceData::tt_alzh() 2.71135788 2.755388420 2.9922968274
>> 2.993737666 3.144241330 3.842422679    20
>>
>> My question is would it be acceptable to change my data load calls in my
>> examples and the vignette to reduce the runtime or is this against best
>> practice and should I look for improvements elsewhere? I ask because I feel
>> I'm running out of easy options at reducing the overall runtime.
>>
>> Kind regards,
>> Alan.
>>
>>
>> ________________________________
>> From: Martin Morgan <mtmorgan.bioc using gmail.com>
>> Sent: 22 March 2021 18:17
>> To: Kern, Lori <Lori.Shepherd using RoswellPark.org>; Murphy, Alan E <
>> a.murphy using imperial.ac.uk>; bioc-devel using r-project.org <
>> bioc-devel using r-project.org>
>> Subject: Re: [Bioc-devel] Methods to speed up R CMD Check
>>
>> (sticking bioc-devel back in the recipient list so others can learn /
>> improve / disagree with this suggestion.)
>>
>> my suggestion was to memorize the function in your package, not in the
>> example. Examples are not run independently, but collated into a single
>> file (EWCR-Ex.R in the EWCR.Rcheck directory, after running R CMD check)
>> and sourced. And the suggestion was not to solve the problem of examples
>> running slowly, but avoiding repeatedly calculating the same value. For
>> instance, from Hervé’s email ewceData::tt_alzh could be memorized in the
>> package. The first call would take several seconds, but subsequent calls
>> would be instantaneous. But as Hervé says that function should be cleaned
>> up anyway so that 'tricks' like memorization might not be necessary.
>>
>>
>> From: "Murphy, Alan E" <a.murphy using imperial.ac.uk>
>> Date: Monday, March 22, 2021 at 12:37 PM
>> To: Martin Morgan <mtmorgan.bioc using gmail.com>
>> Subject: Re: [Bioc-devel] Methods to speed up R CMD Check
>>
>> Hey Martin,
>>
>> Thanks for the suggestion but how would I go about using this, let's say,
>> for the examples? If I redefine the memoise function in each example (as it
>> won't otherwise exist) would this not take the same amount of time?
>>
>> Kind regards,
>> Alan.
>>
>> From: Martin Morgan <mtmorgan.bioc using gmail.com>
>> Sent: 22 March 2021 13:34
>> To: Kern, Lori <Lori.Shepherd using RoswellPark.org>; Murphy, Alan E <
>> a.murphy using imperial.ac.uk>; bioc-devel using r-project.org <
>> bioc-devel using r-project.org>
>> Subject: Re: [Bioc-devel] Methods to speed up R CMD Check
>>
>>
>> *******************
>> This email originates from outside Imperial. Do not click on links and
>> attachments unless you recognise the sender.
>> If you trust the sender, add them to your safe senders list
>> https://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping
>> for this address.
>> *******************
>> if your examples repeatedly calculate the same thing, and this is also
>> typical of how users use your package, it might make sense to 'memoise' key
>> functions in your package https://cran.r-project.org/package=memoise
>>
>> Martin
>>
>> On 3/22/21, 7:41 AM, "Bioc-devel on behalf of Kern, Lori" <
>> bioc-devel-bounces using r-project.org on behalf of
>> Lori.Shepherd using RoswellPark.org> wrote:
>>
>>      If your data is using ExperimentHub,  it should already be caching the
>> downloaded data.  Once it is downloaded once, it should be using the cached
>> download for subsequent calls to the hub.  We will investigate to ensure
>> that the caching mechanism is functioning properly on all of our
>> Bioconductor builders.
>>
>>
>>
>>      Lori Shepherd
>>
>>      Bioconductor Core Team
>>
>>      Roswell Park Comprehensive Cancer Center
>>
>>      Department of Biostatistics & Bioinformatics
>>
>>      Elm & Carlton Streets
>>
>>      Buffalo, New York 14263
>>
>>      ________________________________
>>      From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf of
>> Murphy, Alan E <a.murphy using imperial.ac.uk>
>>      Sent: Monday, March 22, 2021 5:38 AM
>>      To: bioc-devel using r-project.org <bioc-devel using r-project.org>
>>      Subject: [Bioc-devel] Methods to speed up R CMD Check
>>
>>      Hi all,
>>
>>      I am working on the development of [EWCE](
>> https://secure-web.cisco.com/1uG0LGgCjdg85VowwaeRHk2fMjXFkOtQWsgL8p2MQD2j2PZFh_tqvJWaCHJfArA8O4B2WLG1JOwn31NISgSrPW3syUdiPlWNi7cHAMCWKZUQ8d9RrlR-d81LDXXx0xtfCI5ZjjTyFS2xxM2tDea27Y51bWk4Y7jpSnC8Bx768AHBeaJAg3YAK_HTxR6hMzFW99X6Pg8bETgPYi92ccneqdgAJcDBIdfwZnd9OMaM4JS0kY9kYT3F58ho2jM_k0n6EqMzhuXl3HEM7uneL7twMxTTxSZ-vFC1U1eFSkAr0sp38AyD3g6gTbf-vUbghaGV-JBKoybZto3ZDmHhs8OE6cQ/https%3A%2F%2Fgithub.com%2FNathanSkene%2FEWCE)
>> but have hit an issue with R CMD check's runtime. I have been informed this
>> test needs to be completed in 15 minutes but mine is currently running in
>> ~24 minutes and I am looking for methods to speed this up. The main
>> culprits for the runtime issue are:
>>
>>      checking examples (5m 49.8s)
>>      Running �testthat.R� [308s/469s] (7m 49.1s)
>>      checking for unstated dependencies in vignettes (7m 49.4s)
>>      checking re-building of vignette outputs (5m 12s)
>>
>>      With the exception of using smaller datasets which I will consider
>> myself, is there known ways of speeding these up? EWCE derives data from an
>> Experimenthub package [ewceData](
>> https://secure-web.cisco.com/1r4B8NJkUGCpdQsdBW8RWLwGvwEA9TlvXY7VUYgAKS-TBmT7s-6a3zMLfS6rXRVUUxG4x8SCYzXUXZKYMtZ_ysyEzk56tVxfvju-9mo6l11KLQ7CzEpFMikVqdyT25f0G3SQK5u9b0_5JK2gNhR4l0j_5_b_B-uPxzyFF0jtLCZFHKW2-pD7e2P4RVOfbgRALwBXM-hQvhcoaxxrR8tWz3JLjKxWqNIhTrsJdATsAnUO0EnQ5U8JNXClmS9LvWwyTf-0ZqokYXTkjdfYDUAm6KiAGNJo4oX99GUBQZllyiIDprF07KeqjsMNMg4dbmMh0t6jl-UEiUaV3j1xRG8UyyA/https%3A%2F%2Fgithub.com%2Fneurogenomics%2FewceData)
>> for its examples, tests and vignette. This is run repeatedly and I have
>> noted this takes a significant amount of time to load a dataset. Is there
>> anyway of caching the datasets for all the checks or more generally of
>> speeding this up?
>>
>>      I have heard of the use of [long tests](
>> http://secure-web.cisco.com/1yfwFXFFfUKBuFTwUeuS8XGYbh53YduG9ZGKMVmVU9Yrgxg4DbKA0_prEIOCNcgc8uANWYzUw115x_8njawa33mjqM5ZBEvTPTJhmXRzttl1eaRVu3Pa0FTA-d-wPRK3Xxa4miiXob79k_exN0isifYlHPTK7WRxh9_LbFye17PwVVOGsfxjEFKi8WF27D6LWJynf8k-L7iEqB2MSDkf_1zWmfA2qJByna147_Jkaa-nLx9FFl4VhsosBoNDE_qnC939XrCLLCT7RgV0jPukrVdahccxXfT6bgtGBR8ZKfj25BoCeE1_hTJXFgGP0CGmegMYqqmsbd3pGTbo63vTW-A/http://bioconductor.org/developers/how-to/long-tests/)
>> which aren't run daily by Bioconductor but are these still checked in R CMD
>> Check? Is there any other way to exclude my tests from the R CMD Check
>> given they aren't a necessity from Bioconductor?
>>
>>      Does checking for unstated dependencies in vignettes have a long
>> runtime based on the number of package dependencies? If I just export
>> specific functions from packages will this check time reduce?
>>
>>      Lastly, is there any way to get an exception of the 15 minute maximum?
>> I may be ill-informed but is the max time for packages on Bioconductor's
>> daily check 40 minutes which my code in its current state would complete by.
>>
>>      Kind regards,
>>      Alan.
>>
>>
>>              [[alternative HTML version deleted]]
>>
>>
>>
>>      This email message may contain legally privileged and/or confidential
>> information.  If you are not the intended recipient(s), or the employee or
>> agent responsible for the delivery of this message to the intended
>> recipient(s), you are hereby notified that any disclosure, copying,
>> distribution, or use of this email message is prohibited.  If you have
>> received this message in error, please notify the sender immediately by
>> e-mail and delete this email message from your computer. Thank you.
>>           [[alternative HTML version deleted]]
>>
>>      _______________________________________________
>>      Bioc-devel using r-project.org mailing list
>>      https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>>          [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.github using gmail.com



More information about the Bioc-devel mailing list