[Bioc-devel] Federation of sub-packages acceptable for Bioconductor?

Tue Apr 16 17:45:37 CEST 2013

On 04/16/2013 05:59 AM, Ulrich Bodenhofer wrote:
> Hi,
>
> Some students of mine are currently working on a unified R interface to multiple
> C/C++ algorithms of a kind (too early to mention details). The special challenges
> are that some of these algorithm have special requirements, that not
> necessarily all of them will run on all target platforms, and that all make quite
> special settings in their Makefiles. In order to deal with these challenges,
> in particular, to be able to provide a maximum of algorithms on each platform, we
> have thought of putting all algorithms in separate packages and writing a wrapper
> package that provides a unified interface to all algorithms available on the
> respective platform.
>
> To illustrate our idea, consider the following example: suppose that the algorithms
> are implemented in packages fooAlg1, fooAlg2 and fooAlg3, which are mutually
> independent of each other. Basic functionality that is used by all three packages
> is contained in a package fooCommon on which the packages fooAlg1, fooAlg2 an
> fooAlg3 all depend. Then the wrapper package foo provides a unified
> interface to those packages which are available on the given platform:
>
>           +-- suggests --> fooAlg1 >-- depends --+
>           |                                      |
>    foo >--+-- suggests --> fooAlg2 >-- depends --+--> fooCommon
>           |                                      |
>           +-- suggests --> fooAlg3 >-- depends --+
>
> If the user tries to call an algorithm via the unified interface, package foo
> checks
> if the required package is available. If so, it loads the package and runs the
> algorithm. If not, an error message is shown.
>
> So far, so good. Now my questions:
>
> - Do you think this is a good idea?
>    If not, do you have a better suggestion?

Hi Ulrich -- sounds like an interesting project, and of course it's great to 
make highly performant and relevant code available to a wide audience!

The Bioconductor policy is that packages are available across platforms. Really 
this benefits the end user (no need to discover why it worked on my Mac but not 
on the computer cluster) and the developer (only one branch of code to 
maintain). There are exceptions in Bioconductor, but invariably these cause 
problems for users, developers, and Bioconductor, and weaken the appeal of a 
high-level language touting reproducible research -- it is a _mistake_ to plan 
to implement something that will be so unsatisfactory!

I would instead recommend the difficult path of identifying and implementing the 
cross-platform core of your ideas and ambitions; this in itself can be a 
rewarding software development activity.

> - Would it be acceptable to submit a whole bunch of packages to Bioconductor
>    instead of one single package that contains all algorithms?
>    If yes, would it be ok if the documentation were concentrated in the
>    wrapper package foo? (In particular, would it be ok if fooAlg1, fooAlg2 and
>    fooAlg3 had no vignettes?

Bioconductor policy is that packages have vignettes.

Having complicated dependencies requires considerable discipline on the part of 
developers (maybe you have control over this...) and users (but not this!). It 
also makes use of the package difficult, as your consideration of documentation 
implies -- the user will get easily lost in the already confusing R help system. 
I would instead identify strategies to organize your code within a single 
package. I would also encourage an evolutionary design, where if the full 
ambition of your project is realized and adopted by a broad or deep community of 
users, perhaps in the future the single package evolves to several packages; 
this approach is simplified when the software works uniformly across platforms.

I think also that R and Bioconductor users are, speaking broadly, different from 
general purpose programming language users; they have well-defined use cases 
(differential expression in RNA-seq; copy number variation in DNA-seq; 
annotation of regions of interest, machine learning for exploratory analysis, 
...) and are looking for a 'package' that fulfils their use case, rather than 
for algorithms that can in principle be stitched together to form a solution. 
This does not encourage the packaging of algorithms, but of solutions. This can 
sometimes be less than optimal, e.g., Import'ing a single function from a much 
larger package. Perhaps as a corollary, my opinion is that code should be 
organized into packages in which vignettes make sense.

Hope that helps, and look forward to your contributions!

Martin

>
> Thanks in advance for your kind assistance!
>
> Kind regards,
> Ulrich
>
> --
> Dr. Ulrich Bodenhofer
> Associate Professor
> Institute of Bioinformatics
>
> Johannes Kepler University
> Altenberger Str. 69
> 4040 Linz, Austria
>
> Tel. +43 732 2468 4526
> Fax +43 732 2468 4539
> bodenhofer at bioinf.jku.at
> http://www.bioinf.jku.at/
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793