[Bioc-devel] R cmd check time limits for BioConductor

Kevin R. Coombes krcoombes at mdacc.tmc.edu
Tue Jun 10 19:34:46 CEST 2008


Hi Robert,

I didn't think the issues were new, but I figured it couldn't hurt to 
raise them again....

I also understand the need for the time limits, and agree with their 
rationale. But building ("R CMD build" or "R CMD build --binary") for 
the package I'm putting together is pretty fast. The part that takes 
time is checking, largely because I am running regression tests on 
several different honest-to-god real data sets. And I'm doing the 
testing with at least three different statistical models to fit the data 
(since we don't claim to know which one will ultimately turn out to be 
the best). Now, I could game the system by breaking everything into 3 
different sub-packages, where each test could probably be completed 
within five minutes. However, that seems like a rather artificial 
approach to the problem.  I guess what I'd really like is two different 
levels of testing. In addition to the time-consuming tests, my package 
also has scripts that do basic tests to make sure that the code behaves 
(as) sensibly (as possible) even when the user hands it the wrong kind 
of inputs. It would be nice to have a system that allowed these "easy 
but important interface tests" to run routinely while keeping the "time 
consuming core algorithm development" tests around for when you are 
testing out more radical changes.

Thanks for listening ...

	Kevin

Robert Gentleman wrote:
> Hi Kevin,
> 
> Kevin R. Coombes wrote:
>> Hi,
>>
>> I have considered that possibility, but am not yet convinced that it 
>> is the best approach. I will, of course, do something like that if I 
>> cannot persuade this list that an alternative approach might be 
>> better. The 
> 
> 
> Hi Kevin,
>   These are good points, and not ones that are somehow suddenly new. The 
> issues are ones we grapple with, and there are many different solutions 
> that developers have taken.
> 
>   First, I think that there may be a misconception in play. The build 
> system is essentially just that: a build system.  We don't have the 
> resources to provide a comprehensive testing resource for developers. We 
> do minimal checks and push out the packages that pass those in as timely 
> a fashion as possible. We do expect that developers take the steps you 
> are describing, and before committing code to BioC that they have been 
> careful to run all appropriate testing they deem necessary, but I don't 
> see where there is, or should be, a reliance on having the testing done 
> every day on four (or more) platforms, or being done by Bioconductor at 
> all.  I think that is one of the developer's responsibilities, not the 
> project's.
> 
>   The time limits were instituted because the build system was unable to 
> complete within 24 hours (and the math is pretty simple, there are over 
> 260 packages, so we need to build more than 10 per hour to be done every 
> day).  And as we upgrade equipment, we are hoping to be able to keep the 
> guidelines as they are.  Other options are to allow longer tests but 
> have longer delays before packages are ready. My impression is that most 
> developers would rather have their code available sooner, but I would 
> appreciate hearing alternative points of view.  Perhaps this topic can 
> be visited during the developer day at BioC2008.
> 
>   I strongly encourage you to do the testing you believe is 
> appropriate.  I will mention that the basic checks for all of R (and 
> there are two levels of testing even there) would run within the time 
> frame we have asked package maintainers to meet.
> 
>   best wishes
>     Robert
> 
> 
>> basic argument is:
>>
>> * Complex algorithms can be better maintained if they are accompanied 
>> by regression testing.
>> * "R CMD check" provides an automated method to run regression tests, 
>> with a defined directory structure for storing those tests.
>> * Changing the directory location in the source makes running the 
>> regression tests more awkward and thus less likely to occur on a 
>> regular basis.
>> * The "--no-tests" argument already provides a mechanism for 
>> preventing the tests from being run.
>>
>> What appears to be missing is either a mechanism to designate the 
>> tests as optional or to indicate a preference for not running some or 
>> all of them. I can think of three ways to accomplish my goals in this 
>> matter:
>>
>> [1] Make "--no-tests" the default way to run "R CMD check" at 
>> BioConductor. (Of course, this is unlikely to be the optimal solution 
>> since it merely avoids the question.)
>> [2] Add a field to the DESCRIPTION file that tells "R CMD check" 
>> whether or not to run the tests. Something like
>>     Tests: run
>> or
>>     Tests: dontrun
>> [3] Add an optional special file in the tests directory that indicates 
>> the complexity/length of the tests that would allow "R CMD check" to 
>> decide whether or not to run them. Perhaps something like
>>
>> ###################
>> # COMPLEXITY file
>>
>> test1.R: long
>> test2.R: short
>> ...
>> ###################
>>
>> Of course, options [2] or [3] require changes to "R CMD check" (for 
>> which I should eventually move this discussion to the R-devel list), 
>> but I am really only interested in convincing BioConductor that 
>> (possibly complex) regression tests are a good thing, and should be 
>> encouraged by adopting something like [1].
>>
>> Best,
>>     Kevin
>>
>> Laurent Gautier wrote:
>>> 2008/6/10 Kevin R. Coombes <krcoombes at mdacc.tmc.edu>:
>>>> Hi,
>>>>
>>>> The BioConductor package guidelines say that a package should take 
>>>> less than
>>>> five minutes to run "R CMD check". I have a package that is almost 
>>>> ready to
>>>> submit; however, it currently includes nontrivial regression testing 
>>>> in the
>>>> "tests" subdirectory. With the tests, the time for "R CMD check" 
>>>> could be
>>>> significantly longer than five minutes. Without the tests, the package
>>>> easily fits within the time limit.
>>>>
>>>> [1] I know that I can run "R CMD check --no-tests [PKG]" to prevent the
>>>> tests from running when I check the code myself. Is there any way for a
>>>> package submitted to BioConductor to indicate that the tests should be
>>>> skipped?
>>>>
>>>> [2] Alternatively, is there an easy way to include the tests so that 
>>>> I can
>>>> run them whenever I want to make sure I haven't broken the code (too 
>>>> badly
>>>> ...), but not force everyone else to run them when checking the rest 
>>>> of the
>>>> structure of the code and documentation?
>>>
>>> You could consider having them in your package, in a directory
>>> inst/tests/ for example
>>> (so the tests are still available from an installed package).
>>>
>>>> Thanks in advance,
>>>>    Kevin
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>



More information about the Bioc-devel mailing list