[Bioc-devel] [devteam-bioc] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages.
mtmorgan at fredhutch.org
Wed Dec 31 19:10:12 CET 2014
On 12/31/2014 08:47 AM, Peng Yu wrote:
> On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>> On 12/24/2014 07:31 PM, Maintainer wrote:
>>> Many bioconductor packages Depends on other packages but not Imports
>>> other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
>>> usually preferred to Depends.
>>> Could the unnecessary Depends be forced to be replaced by Imports?
>>> This should improve the package load time significantly.
>> R package symbols and other objects are collated at build time into a 'name
>> space'. When used,
>> - Import: loads the name space from disk.
>> - Depends: loads the name space from disk, and attaches it to the search()
>> Attaching is very inexpensive compared to loading, so there is no speed
>> improvement gained by Import'ing instead of Depend'ing.
> Yes. For example, changing Depends to Imports does not improve the
> package load time much.
> But loading a package in 4 sec seems to be too long.
Generally, yes, it seems like this should at least give the illusion of fast load.
4 seconds is not long in comparison to the time spent in an interactive analysis
session or processing sequence-scale data.
Recognizing that package load times can be substantial may influence some
approaches, e.g., avoiding unnecessary (re)loading of packages during
development, preferring multi-core to socket or other parallelization
strategies, using persistent R sessions when responding to web service requests.
In MBASED, the DESCRIPTION file has
Depends: RUnit, BiocGenerics, BiocParallel, GenomicRanges
RUnit almost certainly belongs in Suggests: (no use to the end user; not used by
R code except during package build / check) but this likely has minimal impact
on load time; the major cost is the S4-heavy GenomicRanges and it's dependencies.
During start-up a reasonable (e.g., 25%) performance benefit can be realized by
telling R to allocate additional memory up-front; on my Linux box I have
$ alias Rdev
/home/mtmorgan/bin/R-devel/bin/R --no-save --quiet --min-vsize=2048M
> user system elapsed
> 4.404 0.100 4.553
> For example, it only takes 10% of the time to load ggplot2. It seems
> that many bioconductor packages have similar problems.
> user system elapsed
> 0.394 0.036 0.460
>> The main reason to Depend: on a package is because the symbols defined by
>> the package are needed by the end-user. Import'ing a package is appropriate
>> when the package provides functionality only relevant to the package author.
> What causes the load time to be too long? Is it because exporting too
> many functions from all dependent packages to the global namespace?
>> There are likely to be specific packages that mis-use Depends; packages such
>> as IRanges, GenomicRanges, etc use Depends: as intended, to provide
>> functions that are useful to the end user.
>> Maintainers are certainly encouraged to think carefully about adding
>> packages providing functionality irrelevant to the end-user to the Depends:
>> field. The codetoolsBioC package (available from svn, see
>> http://bioconductor.org/developers/how-to/source-control/) provides some
>> mostly reliable hints to package authors about correctly formulating a
>> NAMESPACE file to facilitate using Imports: instead of Depends:.
>> General questions about Bioconductor packages should be addressed to the
>> support forum https://support.bioconductor.org.
>> Questions about Bioconductor development (such as this) should be addressed
>> to the bioc-devel mailing list (subscription required)
>> I have cc'd the bioc-devel mailing list; I hope that is ok.
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioc-devel