[Bioc-devel] R cmd check time limits for BioConductor

Herve Pages hpages at fhcrc.org
Tue Jun 10 20:00:26 CEST 2008


Hi,

Robert Gentleman wrote:
[...]
>   The time limits were instituted because the build system was unable to 
> complete within 24 hours (and the math is pretty simple, there are over 
> 260 packages, so we need to build more than 10 per hour to be done every 
> day).

Just to clarify:

- We build BioC devel *and* BioC release every day.

- Some build machines are running both builds (devel and release) so at most
   12 hours can be spent on each build (the devel builds run from noon to midnight
   and the release builds from midnight to noon, Seattle time).

- The builds are parallelized i.e. up to 4 'R CMD check' processes can run
   simultaneously on the same build machine at any given time. As a consequence,
   an entire build run (250-270 packages) takes between 6 and 11 hours
   on each build machine (64-bit Linux like wilson1-2 are the fastest).
   Parallelization is the only way an entire build run can be done in less
   than 12 hours on all the machines.

- Note that 'R CMD check' is not the only command that is executed for each
   package. The build stages are: (a) install the dependencies, (b) run 'R CMD build',
   (c) run 'R CMD check' and (d) build the binary package (on Windows and Mac OS X
   only).

During the same build run, a lot of CPU cycles are wasted because the same
thing can be computed several times. For example each vignette is tested twice:
the 1st time by 'R CMD build' and the 2nd time by 'R CMD check'. We could easily
avoid this by running 'R CMD check --no-vignettes': that would probably make
the builds 10%-30% faster without compromising the current testing paradigm.
Other things are done several times (like installing the exact same package 2
or 3 times, even 4 times in some rare situations) but trying to avoid this
would more complicated.

Cheers,
H.



More information about the Bioc-devel mailing list