[Bioc-devel] R cmd check time limits for BioConductor

Tue Jun 10 22:27:18 CEST 2008

Dear Kevin,

> I also understand the need for the time limits, and agree with their 
> rationale. But building ("R CMD build" or "R CMD build --binary") for 
> the package I'm putting together is pretty fast. The part that takes 
> time is checking, largely because I am running regression tests on 
> several different honest-to-god real data sets. And I'm doing the 
> testing with at least three different statistical models to fit the data 
> (since we don't claim to know which one will ultimately turn out to be 
> the best). Now, I could game the system by breaking everything into 3 
> different sub-packages, where each test could probably be completed 
> within five minutes. However, that seems like a rather artificial 
> approach to the problem.  I guess what I'd really like is two different 
> levels of testing. In addition to the time-consuming tests, my package 
> also has scripts that do basic tests to make sure that the code behaves 
> (as) sensibly (as possible) even when the user hands it the wrong kind 
> of inputs. It would be nice to have a system that allowed these "easy 
> but important interface tests" to run routinely while keeping the "time 
> consuming core algorithm development" tests around for when you are 
> testing out more radical changes.

I think I had similar issues with the vsn and tilingArray package. They 
contain tests of their functionalities in their vignettes that take 
hours or days, and >10GB of RAM in the tilingArray case. Like you, I 
feel that these are necessary to properly test the packages, but there 
is no way the build system in Seattle should do this every night.

So the vignettes sit in the inst/scripts directory, and I (or anyone) 
can invoke them manually when they feel the urge. You might be 
interested in the "Makefile" in inst/doc that makes sure that the PDFs 
from these vignettes are still visible to package users like "normal" 
vignettes.
-- 
Best wishes
  Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

> 
> Robert Gentleman wrote:
>> Hi Kevin,
>>
>> Kevin R. Coombes wrote:
>>> Hi,
>>>
>>> I have considered that possibility, but am not yet convinced that it 
>>> is the best approach. I will, of course, do something like that if I 
>>> cannot persuade this list that an alternative approach might be 
>>> better. The 
>>
>>
>> Hi Kevin,
>>   These are good points, and not ones that are somehow suddenly new. 
>> The issues are ones we grapple with, and there are many different 
>> solutions that developers have taken.
>>
>>   First, I think that there may be a misconception in play. The build 
>> system is essentially just that: a build system.  We don't have the 
>> resources to provide a comprehensive testing resource for developers. 
>> We do minimal checks and push out the packages that pass those in as 
>> timely a fashion as possible. We do expect that developers take the 
>> steps you are describing, and before committing code to BioC that they 
>> have been careful to run all appropriate testing they deem necessary, 
>> but I don't see where there is, or should be, a reliance on having the 
>> testing done every day on four (or more) platforms, or being done by 
>> Bioconductor at all.  I think that is one of the developer's 
>> responsibilities, not the project's.
>>
>>   The time limits were instituted because the build system was unable 
>> to complete within 24 hours (and the math is pretty simple, there are 
>> over 260 packages, so we need to build more than 10 per hour to be 
>> done every day).  And as we upgrade equipment, we are hoping to be 
>> able to keep the guidelines as they are.  Other options are to allow 
>> longer tests but have longer delays before packages are ready. My 
>> impression is that most developers would rather have their code 
>> available sooner, but I would appreciate hearing alternative points of 
>> view.  Perhaps this topic can be visited during the developer day at 
>> BioC2008.
>>
>>   I strongly encourage you to do the testing you believe is 
>> appropriate.  I will mention that the basic checks for all of R (and 
>> there are two levels of testing even there) would run within the time 
>> frame we have asked package maintainers to meet.
>>
>>   best wishes
>>     Robert
>>
>>
>>> basic argument is:
>>>
>>> * Complex algorithms can be better maintained if they are accompanied 
>>> by regression testing.
>>> * "R CMD check" provides an automated method to run regression tests, 
>>> with a defined directory structure for storing those tests.
>>> * Changing the directory location in the source makes running the 
>>> regression tests more awkward and thus less likely to occur on a 
>>> regular basis.
>>> * The "--no-tests" argument already provides a mechanism for 
>>> preventing the tests from being run.
>>>
>>> What appears to be missing is either a mechanism to designate the 
>>> tests as optional or to indicate a preference for not running some or 
>>> all of them. I can think of three ways to accomplish my goals in this 
>>> matter:
>>>
>>> [1] Make "--no-tests" the default way to run "R CMD check" at 
>>> BioConductor. (Of course, this is unlikely to be the optimal solution 
>>> since it merely avoids the question.)
>>> [2] Add a field to the DESCRIPTION file that tells "R CMD check" 
>>> whether or not to run the tests. Something like
>>>     Tests: run
>>> or
>>>     Tests: dontrun
>>> [3] Add an optional special file in the tests directory that 
>>> indicates the complexity/length of the tests that would allow "R CMD 
>>> check" to decide whether or not to run them. Perhaps something like
>>>
>>> ###################
>>> # COMPLEXITY file
>>>
>>> test1.R: long
>>> test2.R: short
>>> ...
>>> ###################
>>>
>>> Of course, options [2] or [3] require changes to "R CMD check" (for 
>>> which I should eventually move this discussion to the R-devel list), 
>>> but I am really only interested in convincing BioConductor that 
>>> (possibly complex) regression tests are a good thing, and should be 
>>> encouraged by adopting something like [1].
>>>
>>> Best,
>>>     Kevin
>>>
>>> Laurent Gautier wrote:
>>>> 2008/6/10 Kevin R. Coombes <krcoombes at mdacc.tmc.edu>:
>>>>> Hi,
>>>>>
>>>>> The BioConductor package guidelines say that a package should take 
>>>>> less than
>>>>> five minutes to run "R CMD check". I have a package that is almost 
>>>>> ready to
>>>>> submit; however, it currently includes nontrivial regression 
>>>>> testing in the
>>>>> "tests" subdirectory. With the tests, the time for "R CMD check" 
>>>>> could be
>>>>> significantly longer than five minutes. Without the tests, the package
>>>>> easily fits within the time limit.
>>>>>
>>>>> [1] I know that I can run "R CMD check --no-tests [PKG]" to prevent 
>>>>> the
>>>>> tests from running when I check the code myself. Is there any way 
>>>>> for a
>>>>> package submitted to BioConductor to indicate that the tests should be
>>>>> skipped?
>>>>>
>>>>> [2] Alternatively, is there an easy way to include the tests so 
>>>>> that I can
>>>>> run them whenever I want to make sure I haven't broken the code 
>>>>> (too badly
>>>>> ...), but not force everyone else to run them when checking the 
>>>>> rest of the
>>>>> structure of the code and documentation?
>>>>
>>>> You could consider having them in your package, in a directory
>>>> inst/tests/ for example
>>>> (so the tests are still available from an installed package).
>>>>
>>>>> Thanks in advance,
>>>>>    Kevin
>>>>>