[Bioc-devel] R cmd check time limits for BioConductor

Tue Jun 10 21:03:45 CEST 2008

This is an important topic.  I believe that the building/checking
process is a key source of value added by the Bioc project for developers.

A worthwhile thought experiment: What would be done if we had unlimited
resources?  It seems to me that a primary resource that a developer
lacks access to is the range of platforms -- hardware, OS, compilers --
that we want users to be able to work with reliably.  Thus a very
high priority for the build/check system is coverage of the main
varieties of "system" that users are likely to use Bioconductor on.

The devel branch is very important because it indicates how ongoing
changes to R affect performance/accuracy/interactions of package code.  Most
developers aren't going to be updating R and all other packages approximately
daily.  Thus, without the devel build system, many developers would
find themselves working hard when a new R release became imminent, to
port abruptly to the new release.  The devel build system allows this
to occur somewhat more smoothly, and affords the possibility of feedback
to R core when tentative changes to R are problematic.

I recognize that the points just made are well-known, and these remarks
are not made defensively.  Instead I am trying to come up with some
a priori limits to testing requirements falling to bioconductor, as
opposed to requirements that lie uniquely with the developer.  So the
first two priorities are, briefly:

1) Cover the platforms
2) Track performance relative to evolving R

Now we know full well that our resources require that package testing
be limited to around five minutes.  Is this in itself a value or an
obstacle to ensuring project/package reliability?

My sense is that this constraint might be a value -- I understand that
a very multifaceted package may have many essential tests that can finish
rapidly but the sum of testing times exceeds our limits.  Perhaps that
package should be broken up... Perhaps it should get an exception...
I don't know.  Other things being equal, I think it is good for the software
that there be cases that are legitimate tests that run quickly.  It means
users can demonstrate the functionality in short order, that the tests
can be varied and examined without long delay, and that, probably, we
have capacity to do more tests of different facets of the package.

>From the project perspective, I feel that the tests we need to be
concerned about most involve tests that would indicate problems with
respect to priorities 1) and 2).  Portability problems, or dependencies
on ephemeral features of R, might only crop up with certain very long
tests ... but I think that would be exceptional.  Thus the deep tests
should be done "at home" and the light ones left for the project
system.

Finally, the complexity of the project test/build system has to be
kept very manageable -- we have {release, devel} X platform X {software,
experiment, annotation} and introducing a short/long testing stream
may be feasible but may not pay off.  The point is that even if we did
have a lot more resources I am not sure it would be sound to allow
indefinite test return times.

I am open to correction or criticism on any point made above.  I am
trying to articulate some points about testing in the project that
are probably quite superficial from the perspective of serious software
development and engineering -- yet the breadth of testing
accomplished and the necessity of release/devel branches is not very
widely appreciated among some of my contacts in other domains ...
so I have taken this opportunity to articulate these views to the devel
group.

The information transmitted in this electronic communica...{{dropped:10}}