[Bioc-devel] [devteam-bioc] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages.

Martin Morgan mtmorgan at fredhutch.org
Wed Dec 31 19:10:12 CET 2014


On 12/31/2014 08:47 AM, Peng Yu wrote:
> On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>> On 12/24/2014 07:31 PM, Maintainer wrote:
>>>
>>> Hi,
>>>
>>> Many bioconductor packages Depends on other packages but not Imports
>>> other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
>>> usually preferred to Depends.
>>>
>>>
>>> http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends
>>> http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
>>>
>>> Could the unnecessary Depends be forced to be replaced by Imports?
>>> This should improve the package load time significantly.
>>
>>
>> R package symbols and other objects are collated at build time into a 'name
>> space'. When used,
>>
>> - Import: loads the name space from disk.
>> - Depends: loads the name space from disk, and attaches it to the search()
>> path.
>>
>> Attaching is very inexpensive compared to loading, so there is no speed
>> improvement gained by Import'ing instead of Depend'ing.
>
> Yes. For example, changing Depends to Imports does not improve the
> package load time much.
>
> But loading a package in 4 sec seems to be too long.

Generally, yes, it seems like this should at least give the illusion of fast load.

4 seconds is not long in comparison to the time spent in an interactive analysis 
session or processing sequence-scale data.

Recognizing that package load times can be substantial may influence some 
approaches, e.g., avoiding unnecessary (re)loading of packages during 
development, preferring multi-core to socket or other parallelization 
strategies, using persistent R sessions when responding to web service requests.

In MBASED, the DESCRIPTION file has

Depends: RUnit, BiocGenerics, BiocParallel, GenomicRanges

RUnit almost certainly belongs in Suggests: (no use to the end user; not used by 
R code except during package build / check) but this likely has minimal impact 
on load time; the major cost is the S4-heavy GenomicRanges and it's dependencies.

During start-up a reasonable (e.g., 25%) performance benefit can be realized by 
telling R to allocate additional memory up-front; on my Linux box I have

$ alias Rdev
alias Rdev='R_LIBS_USER=/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/devel 
/home/mtmorgan/bin/R-devel/bin/R --no-save --quiet --min-vsize=2048M 
--min-nsize=45M'

Martin

>
>>   system.time(suppressPackageStartupMessages(library(MBASED)))
>     user  system elapsed
>    4.404   0.100   4.553
>
> For example, it only takes 10% of the time to load ggplot2. It seems
> that many bioconductor packages have similar problems.
>
>> system.time(suppressPackageStartupMessages(library(ggplot2)))
>     user  system elapsed
>    0.394   0.036   0.460
>
>> The main reason to Depend: on a package is because the symbols defined by
>> the package are needed by the end-user. Import'ing a package is appropriate
>> when the package provides functionality only relevant to the package author.
>
> What causes the load time to be too long? Is it because exporting too
> many functions from all dependent packages to the global namespace?
>
>> There are likely to be specific packages that mis-use Depends; packages such
>> as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
>> functions that are useful to the end user.
>>
>> Maintainers are certainly encouraged to think carefully about adding
>> packages providing functionality irrelevant to the end-user to the Depends:
>> field. The codetoolsBioC package (available from svn, see
>> http://bioconductor.org/developers/how-to/source-control/) provides some
>> mostly reliable hints to package authors about correctly formulating a
>> NAMESPACE file to facilitate using Imports: instead of Depends:.
>>
>> General questions about Bioconductor packages should be addressed to the
>> support forum https://support.bioconductor.org.
>>
>> Questions about Bioconductor development (such as this) should be addressed
>> to the bioc-devel mailing list (subscription required)
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel.
>>
>> I have cc'd the bioc-devel mailing list; I hope that is ok.
>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioc-devel mailing list