[Bioc-devel] Upcoming release: *Please* revise your Depends and Suggests of DESCRIPTION

Henrik Bengtsson hb at stat.berkeley.edu
Thu Oct 16 20:39:00 CEST 2008


Hi,

good points.

On Thu, Oct 16, 2008 at 12:42 AM, laurent <lgautier at gmail.com> wrote:
> Henrik,
>
> I agree that the installation footprint is/has been issue, and I would
> think that it could be addressed more efficiently than going the "less
> dependencies" route.

I wouldn't say less "dependencies", but rather "differently
classified" dependencies ("Suggests" could read "Optional" and
"Depends" could read "Critical"; my view).

>
> Having code and data mixed into one package, and the absence of common
> recommended data source for data types, are probably responsible for a
> much larger fraction of the footprint than the dependencies are.

I definitely agree.  My personal take on this is that for most
algorithms the developer could provide:

(i) a high-level function taking objects of more complex classes (e.g.
eSet, ExpressionSet, AffyBatch, RGList, ...), which then utilizes:
(ii) a low-level function operating on basic data types (e.g. vectors
and matrices) and that implements the actually algorithms.

The (i) functions targets end users and higher level calls from other
packages and (ii) mainly targets other developers (and some users).
The API of these low-level functions are less likely to change whereas
the high-level API tend to change more often (new classes are
introduced).  Several packages already has an internal low-level
interface, but it is not always explicit which these functions are
that they're part of a supported API (namespaces sometimes clarifies
this).

If all low-level functions are put in a separate package, then we
achieve part of what you suggest.  It will also be easier to main the
low-level as well as the high-level functions, it will be easier for
other to contribute with say optimized code for the actually algorithm
(which is in the low-level API).

>
> Regarding the example you picked this time, it can be seen as
> disputable. "snapCGH" is clearly advertised as an "umbrella" package,
> unifying several other aCGH-related packages under one common framework.
> One should still be interested in the package, one should also suspect
> that is it going to require other packages (and their respective
> dependencies in turn), shouldn't he ?

Yes, and the life of Bioconductor packages is a dynamic process.
Maybe I picked a bad example this time, and it might be that all the
packages are heavily used most of the time by snapCGH.  My entry point
to snapCGH was the Bioinformatics Application Note (Marioni et al
2006) on the BioHMM model, and the umbrella features of snapCGH is not
the main point in that package.  This might still illustrate a use
case as well the usefulness of a high-/low-level package (and that the
BioHMM model would favor from being available in a package of itself).

See also my comments on the frequently used function smoothScatter()
in "heavy-weight" geneplotter, cf.

 https://stat.ethz.ch/pipermail/bioc-devel/2008-July/001640.html

>
> As a side note, it can seem difficult to objectively judge what
> functions are "rarely called" in a package without actual usage data.

I should have used "optionally" instead.  It is often the developer
who knows when and where certain packages are used and in several
cases it is only some of the loaded packages that are used in any R
session.  For example, in aroma.affymetrix, we only utilize the
'EBImage' package for generating PNGs showing data spatially.  Not all
people do this and if they do not necessarily in every session.  Since
this is "rarely"/"optionally" done, the EBImage package is under
'Suggests' and we use require():ed whenever needed.

Our dependency of EBImage illustrates another argument, which is
software robustness.  EBImage was for several months broken on Windows
(I'm glad to see that it is now fixed), and with a hard dependency
aroma.affymetrix would have been impossible to install and use on
Windows during that time.  In this case I guess the reason was that
original developer handed over the maintainence of EBImage which
caused some startup delays for the new maintainer; these are things we
are always going to face with most packages at some stage or the
other.  As developers we can somewhat protect ourselves *and
downstream developers/users* against this by using the
'Suggests/Imports' fields.

Finally, if one want to install all packages including those in
'Suggests', one can do:

 biocLite("<pkg/group>", dependencies=c("Depends", "Suggests", "Imports"))

Cheers

Henrik

>
>
>
> L.
>
>
>
> On Wed, 2008-10-15 at 13:11 -0700, Henrik Bengtsson wrote:
>> Hi *all* package developers
>>
>> for the upcoming Bioc release, as a developer, could you please revise
>> what packages you put under 'Depends' in your DESCRIPTION files.
>>
>> In many cases packages listed there are only used occationally in a
>> few rarely called functions.  In such cases it is recommended to put
>> such packages under 'Suggests' instead and use require("<pkg>") where
>> ever they are needed.   This will decrease the download/installation
>> footprint.
>>
>> Without picking on a particular package (I've used a different package
>> before), here is an illustrative example involving several packages
>> with large package footprints:
>>
>> In order to use the runBioHMM() segmentation method in the snapCGH package:
>>
>> Package: snapCGH
>> Depends: limma, tilingArray, DNAcopy, GLAD, cluster, methods, aCGH
>> Suggests:
>> Imports:
>>
>> this is what you need to download and install (illustrated package by package):
>>
>> > biocLite("limma");
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/limma_2.15.15.zip'
>> Content type 'application/zip' length 1499394 bytes (1.4 Mb)
>> opened URL
>> downloaded 1016 Kb
>>
>>
>> > biocLite("tilingArray");
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.8/xtable_1.5-4.zip'
>> Content type 'application/zip' length 215584 bytes (210 Kb)
>> opened URL
>> downloaded 210 Kb
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.8/DBI_0.2-4.zip'
>> Content type 'application/zip' length 442504 bytes (432 Kb)
>> opened URL
>> downloaded 432 Kb
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.8/RSQLite_0.7-0.zip'
>> Content type 'application/zip' length 599395 bytes (585 Kb)
>> opened URL
>> downloaded 585 Kb
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.8/zoo_1.5-4.zip'
>> Content type 'application/zip' length 872593 bytes (852 Kb)
>> opened URL
>> downloaded 852 Kb
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.8/sandwich_2.1-0.zip'
>> Content type 'application/zip' length 758762 bytes (740 Kb)
>> opened URL
>> downloaded 740 Kb
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/annotate_1.19.3.zip'
>> Content type 'application/zip' length 1966201 bytes (1.9 Mb)
>> opened URL
>> downloaded 1.3 Mb
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/AnnotationDbi_1.3.12.zip'
>> Content type 'application/zip' length 1517547 bytes (1.4 Mb)
>> opened URL
>> downloaded 1.4 Mb
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.8/strucchange_1.3-4.zip'
>>
>> Content type 'application/zip' length 937367 bytes (915 Kb)
>> opened URL
>> downloaded 915 Kb
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/vsn_3.7.7.zip'
>> Content type 'application/zip' length 1328450 bytes (1.3 Mb)
>> opened URL
>> downloaded 1.3 Mb
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/genefilter_1.21.5.zip'
>> Content type 'application/zip' length 483979 bytes (472 Kb)
>> opened URL
>> downloaded 472 Kb
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/geneplotter_1.19.6.zip'
>> Content type 'application/zip' length 1446820 bytes (1.4 Mb)
>> opened URL
>> downloaded 1.4 Mb
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.8/pixmap_0.4-9.zip'
>> Content type 'application/zip' length 118653 bytes (115 Kb)
>> opened URL
>> downloaded 115 Kb
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/tilingArray_1.19.2.zip'
>> Content type 'application/zip' length 2291041 bytes (2.2 Mb)
>> opened URL
>> downloaded 2.2 Mb
>>
>>
>> > biocLite("DNAcopy");
>>
>> trying URL 'http://bioconductor.org/packages/2.2/bioc/bin/windows/contrib/2.7/DNAcopy_1.14.0.zip'
>> Content type 'application/zip' length 389689 bytes (380 Kb)
>> opened URL
>> downloaded 380 Kb
>>
>>
>> > biocLite("GLAD");
>>
>> trying URL 'http://bioconductor.org/packages/2.2/bioc/bin/windows/contrib/2.7/GLAD_1.16.0.zip'
>> Content type 'application/zip' length 1821831 bytes (1.7 Mb)
>> opened URL
>> downloaded 1.7 Mb
>>
>>
>> > biocLite("cluster");
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.7/cluster_1.11.11.zip'
>> Content type 'application/zip' length 517782 bytes (505 Kb)
>> opened URL
>> downloaded 505 Kb
>>
>>
>> > biocLite("aCGH");
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.7/multtest_1.21.1.zip'
>> Content type 'application/zip' length 1653667 bytes (1.6 Mb)
>> opened URL
>> downloaded 1.6 Mb
>>
>> trying URL 'http://cran.fhcrc.org/bin/windows/contrib/2.7/sma_0.5.15.zip'
>> Content type 'application/zip' length 3070300 bytes (2.9 Mb)
>> opened URL
>> downloaded 2.9 Mb
>>
>> trying URL 'http://bioconductor.org/packages/2.2/bioc/bin/windows/contrib/2.7/aCGH_1.14.0.zip'
>> Content type 'application/zip' length 6692045 bytes (6.4 Mb)
>> opened URL
>> downloaded 6.4 Mb
>>
>> > biocLite("snapCGH");
>>
>> trying URL 'http://bioconductor.org/packages/2.3/bioc/bin/windows/contrib/2.8/snapCGH_1.9.5.zip'
>> Content type 'application/zip' length 1308714 bytes (1.2 Mb)
>> opened URL
>> downloaded 1.2 Mb
>>
>>
>> POINT MADE?
>>
>> Cheers
>>
>> Henrik
>>
>> PS. I'd like to suggest that is is checked as part of the initial
>> package review. DS.
>>
>> _______________________________________________
>> Bioc-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
>



More information about the Bioc-devel mailing list