[Bioc-devel] affypdnn: Request for moving "Depends" packages to "Suggests"
hb at stat.berkeley.edu
Tue Jul 1 23:17:16 CEST 2008
On Tue, Jul 1, 2008 at 3:07 AM, Laurent Gautier <lgautier at gmail.com> wrote:
> Trying to minimize hard package dependencies is commendable, but there
> points to keep in mind.
> Just case the mere thought of seeing one day their name between
> parenthesis right behind "the maintainer" in an email otherwise
> addressed to all is not enough to make innoncent bystanders feel
> concerned about their package and what is happening here, I'll make a
> short list of what it means to have a package listed in the "Depends"
> field, and what are possible reasons for having a package in
> The usage of the field "Suggests" came out of the need for a gradation
> in dependencies between packages, differing from "Depends" with:
> - loading the package not contingent on the presence of dependencies
> - the dependencies not being loaded
> (I that suspect the feature was used for tcltk-based GUIs causing R
> installs without tcltk, and a number of things nor working properly
> under Win32 at the time).
> In my view, or may be that was the general view at the time the
> package incriminated was written, being able to run examples justified
> having a dependency listed in "Depends".
> I agree that this can be discussed, and the particular case discussed,
> the dependencies listed can be moved to "Suggests" (there is a chance
> that the pack was written before "Suggests" was in use, or just about
> the time it appeared). Still, there are many ways to decide on whether
> a package should be in "Depends" or in "Suggests", and all can
> probably be justified; there is a need for common/public guidelines
> (that I might have missed if they exist).
> I'd also like to point that dependencies can be good, and that there
> are in my view currently not enough of them and the call for
> minimizing them should not be taken to the extreme.
> There is a fair amount of duplication in functionalities within
> bioconductor packages (count the FASTA parsers, for example), and it
> would be good if we could have
> more cross-talks between packages.
I fully agree with this and this is what I also mean by minimize
(reduce). Design and code should be reused and not reimplemented.
However, if the "weight" of a package is too large, that is, too many
packages needs to be installed (and/or loaded) in order to reuse a
function, then there is a great risk a given function is
reimplemented/cut'n'pasted instead of "imported" (I guess this is the
downside with GNU source code). I have many favorite examples of
this, but one is definitely the function smoothScatter() of the
'geneplotter' package. To tell someone to use it, or for me to make
use of that single function in one of my packages, here is the
Depends: hierarchy (packages within asterisks are the ones that does
not come with R):
Depends: *Biobase*, methods, lattice, *annotate*
Depends: R (>= 2.7.0), tools, methods, utils
Depends: R (>= 2.4.0), methods, *Biobase*, *AnnotationDbi* (>= 0.1.15), *xtable*
Depends: R (>= 2.7.0), methods, utils, *Biobase* (>= 1.17.0), *DBI*
(>= 0.2-4), *RSQLite* (>= 0.6-4)
Depends: R (≥ 2.3.0), methods
Depends: R (≥ 2.6.0), methods, DBI (≥ 0.2-3)
Depends: R (≥ 2.6.0)
Now, try suggest to someone not doing bioinformatics, that there is
this great smoothScatter() function they can use to plot high-density
scatter plots. I should point out that things got a lot better over
the last few release cycles.
I believe that if the developer is more careful specifying what
packages are really required (Depends) and what packages are more or
less optional (Suggests), then it is more likely that the code is
modularized further (some functions are moved to other packages etc)
and code is reused by others.
In relation to make it easier to reuse code, I would also like to
suggest that developers identify cases where an algorithm can be
implemented using basic R data types (matrices, vectors, lists) and
then have wrapper functions for high-level classes (AffyBatch, eSet,
...) calling this. Not always, but it is often the case that the core
of an algorithm cannot be reused because it is hardwired to a
Bioconductor/platform specific data type. With the above setup it is
also easier to maintain the code when, for instance, new BioC classes
appear, and to identify what parts can be optimized, say, by rewriting
it in native code. This way implementations of algorithms may be
generalized and eventually even be migrated to CRAN where the user
base is much greater (increasing CPU-mileage, quality, ...).
> As an emerging guideline, would it be acceptable to say that "data"
> packages used for examples should be in "Suggests" unless there is
> good reason ?
I'd say that is a good rule of thumb. There are different kinds of
data, for instance example data and annotation data, but I'd say the
rule is more likely to apply than not to apply.
> PS: I have not found the function "request", I assume that "require" was meant.
My bad, but I could always create a package with that function calling
require() to cover my mistake ;)
> PPS: Nothing personal about FASTA, I took the example because I own
> one of the parsers ;-)
> 2008/7/1 Henrik Bengtsson <hb at stat.berkeley.edu>:
>> this one is mainly for the maintainer (Laurent Gautier), but I post it
>> to bioc-devel also as request to minimize package dependencies in R
>> and BioC in general:
>> The current DESCRIPTION of 'affypdnn' is:
>> Package: affypdnn
>> Version: 1.14.3
>> Depends: R (>= 2.3.0), affy (>= 1.5), affydata, hgu95av2probe
>> Could you please update this to:
>> Package: affypdnn
>> Version: 1.14.3
>> Depends: R (>= 2.3.0), affy (>= 1.5)
>> Suggests: affydata, hgu95av2probe
>> and use request("affydata") and request("hgu95av2probe") where those
>> two are actually needed?
>> Bioc-devel at stat.math.ethz.ch mailing list
More information about the Bioc-devel