[Bioc-devel] [devteam-bioc] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages.

Martin Morgan mtmorgan at fredhutch.org
Wed Dec 31 19:10:12 CET 2014

On 12/31/2014 08:47 AM, Peng Yu wrote:
> On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>> On 12/24/2014 07:31 PM, Maintainer wrote:
>>> Hi,
>>> Many bioconductor packages Depends on other packages but not Imports
>>> other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
>>> usually preferred to Depends.
>>> http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends
>>> http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
>>> Could the unnecessary Depends be forced to be replaced by Imports?
>>> This should improve the package load time significantly.
>> R package symbols and other objects are collated at build time into a 'name
>> space'. When used,
>> - Import: loads the name space from disk.
>> - Depends: loads the name space from disk, and attaches it to the search()
>> path.
>> Attaching is very inexpensive compared to loading, so there is no speed
>> improvement gained by Import'ing instead of Depend'ing.
> Yes. For example, changing Depends to Imports does not improve the
> package load time much.
> But loading a package in 4 sec seems to be too long.

Generally, yes, it seems like this should at least give the illusion of fast load.

4 seconds is not long in comparison to the time spent in an interactive analysis 
session or processing sequence-scale data.

Recognizing that package load times can be substantial may influence some 
approaches, e.g., avoiding unnecessary (re)loading of packages during 
development, preferring multi-core to socket or other parallelization 
strategies, using persistent R sessions when responding to web service requests.


Depends: RUnit, BiocGenerics, BiocParallel, GenomicRanges

RUnit almost certainly belongs in Suggests: (no use to the end user; not used by 
R code except during package build / check) but this likely has minimal impact 
on load time; the major cost is the S4-heavy GenomicRanges and it's dependencies.

During start-up a reasonable (e.g., 25%) performance benefit can be realized by 
telling R to allocate additional memory up-front; on my Linux box I have

$ alias Rdev
alias Rdev='R_LIBS_USER=/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/devel 
/home/mtmorgan/bin/R-devel/bin/R --no-save --quiet --min-vsize=2048M 


>>   system.time(suppressPackageStartupMessages(library(MBASED)))
>     user  system elapsed
>    4.404   0.100   4.553
> For example, it only takes 10% of the time to load ggplot2. It seems
> that many bioconductor packages have similar problems.
>> system.time(suppressPackageStartupMessages(library(ggplot2)))
>     user  system elapsed
>    0.394   0.036   0.460
>> The main reason to Depend: on a package is because the symbols defined by
>> the package are needed by the end-user. Import'ing a package is appropriate
>> when the package provides functionality only relevant to the package author.
> What causes the load time to be too long? Is it because exporting too
> many functions from all dependent packages to the global namespace?
>> There are likely to be specific packages that mis-use Depends; packages such
>> as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
>> functions that are useful to the end user.
>> Maintainers are certainly encouraged to think carefully about adding
>> packages providing functionality irrelevant to the end-user to the Depends:
>> field. The codetoolsBioC package (available from svn, see
>> http://bioconductor.org/developers/how-to/source-control/) provides some
>> mostly reliable hints to package authors about correctly formulating a
>> NAMESPACE file to facilitate using Imports: instead of Depends:.
>> General questions about Bioconductor packages should be addressed to the
>> support forum https://support.bioconductor.org.
>> Questions about Bioconductor development (such as this) should be addressed
>> to the bioc-devel mailing list (subscription required)
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel.
>> I have cc'd the bioc-devel mailing list; I hope that is ok.

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

More information about the Bioc-devel mailing list