[Bioc-devel] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages

Peter Haverty haverty.peter at gene.com
Sat Jan 3 17:56:07 CET 2015


There are few other changes in there too, but profiling did identify low
hanging fruit like the sapplys. From there I have found a long list of
refactoring opportunities that offer ~2% improvements. These may not be
worth the risk of reversions, however. I'll be putting together a patch
proposal with the easy changes in the next few days.

Regards,

Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com

On Fri, Jan 2, 2015 at 1:58 AM, Michael Lawrence <lawrence.michael at gene.com>
wrote:

> Pete Haverty is the one working on this. He has almost cut loading time in
> half by just changing some sapply and lappy calls to vapply calls. Most
> likely because allocating all of those list elements is expensive, and
> Martin's memory parameters also help with that.
>
> On Wed, Dec 31, 2014 at 10:30 PM, Hervé Pagès <hpages at fredhutch.org>
> wrote:
>
> > Hi Gordon,
> >
> > My guess is that it has to do with how many symbols get exported.
> > For example on my machine, doing library(limma) in a fresh session
> > takes 0.261s and triggers export of 292 symbols (as reported by
> > ls(..., all.names=TRUE)). Doing library(GenomicRanges) in a fresh
> > session takes 2.724s and triggers export of 1581 symbols (counting
> > the symbols exported by all the packages that get loaded).
> >
> > Michael it's great to hear that somebody is working on speeding up
> > the code in charge of this.
> >
> > Happy New Year everybody!
> > H.
> >
> >
> >
> > On 12/31/2014 06:07 PM, Gordon K Smyth wrote:
> >
> >> Hi Michael,
> >>
> >> What aspect of the methods package causes the slowness?
> >>
> >> There are many packages (limma for one) that depend on methods but load
> >> quickly.
> >>
> >> Regards
> >> Gordon
> >>
> >>
> >>  Date: Wed, 31 Dec 2014 09:17:01 -0800
> >>> From: Michael Lawrence <lawrence.michael at gene.com>
> >>> To: Peng Yu <pengyu.ut at gmail.com>
> >>> Cc: Bioconductor Package Maintainer <maintainer at bioconductor.org>,
> >>>     "bioc-devel at r-project.org" <bioc-devel at r-project.org>
> >>> Subject: Re: [Bioc-devel] [devteam-bioc] Use Imports instead of
> >>>     Depends in the DESCRIPTION files of bioconductor packages.
> >>>
> >>> The slowness is due to the methods package. We're working on it.
> >>>
> >>> Michael
> >>>
> >>> On Wed, Dec 31, 2014 at 8:47 AM, Peng Yu <pengyu.ut at gmail.com> wrote:
> >>>
> >>>  On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan <
> mtmorgan at fredhutch.org>
> >>>> wrote:
> >>>>
> >>>>> On 12/24/2014 07:31 PM, Maintainer wrote:
> >>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Many bioconductor packages Depends on other packages but not Imports
> >>>>>> other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is
> >>>>>> usually preferred to Depends.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>  http://stackoverflow.com/questions/8637993/better-
> >>>> explanation-of-when-to-use-imports-depends
> >>>>
> >>>>  http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/
> >>>>>>
> >>>>>> Could the unnecessary Depends be forced to be replaced by Imports?
> >>>>>> This should improve the package load time significantly.
> >>>>>>
> >>>>>
> >>>>>
> >>>>> R package symbols and other objects are collated at build time into a
> >>>>>
> >>>> 'name
> >>>>
> >>>>> space'. When used,
> >>>>>
> >>>>> - Import: loads the name space from disk.
> >>>>> - Depends: loads the name space from disk, and attaches it to the
> >>>>>
> >>>> search()
> >>>>
> >>>>> path.
> >>>>>
> >>>>> Attaching is very inexpensive compared to loading, so there is no
> speed
> >>>>> improvement gained by Import'ing instead of Depend'ing.
> >>>>>
> >>>>
> >>>> Yes. For example, changing Depends to Imports does not improve the
> >>>> package load time much.
> >>>>
> >>>> But loading a package in 4 sec seems to be too long.
> >>>>
> >>>>   system.time(suppressPackageStartupMessages(library(MBASED)))
> >>>>>
> >>>>    user  system elapsed
> >>>>   4.404   0.100   4.553
> >>>>
> >>>> For example, it only takes 10% of the time to load ggplot2. It seems
> >>>> that many bioconductor packages have similar problems.
> >>>>
> >>>>  system.time(suppressPackageStartupMessages(library(ggplot2)))
> >>>>>
> >>>>    user  system elapsed
> >>>>   0.394   0.036   0.460
> >>>>
> >>>>  The main reason to Depend: on a package is because the symbols
> >>>>> defined by
> >>>>> the package are needed by the end-user. Import'ing a package is
> >>>>>
> >>>> appropriate
> >>>>
> >>>>> when the package provides functionality only relevant to the package
> >>>>>
> >>>> author.
> >>>>
> >>>> What causes the load time to be too long? Is it because exporting too
> >>>> many functions from all dependent packages to the global namespace?
> >>>>
> >>>>  There are likely to be specific packages that mis-use Depends;
> packages
> >>>>>
> >>>> such
> >>>>
> >>>>> as IRanges, GenomicRanges, etc use Depends: as intended, to  provide
> >>>>> functions that are useful to the end user.
> >>>>>
> >>>>> Maintainers are certainly encouraged to think carefully about adding
> >>>>> packages providing functionality irrelevant to the end-user to the
> >>>>>
> >>>> Depends:
> >>>>
> >>>>> field. The codetoolsBioC package (available from svn, see
> >>>>> http://bioconductor.org/developers/how-to/source-control/) provides
> >>>>> some
> >>>>> mostly reliable hints to package authors about correctly formulating
> a
> >>>>> NAMESPACE file to facilitate using Imports: instead of Depends:.
> >>>>>
> >>>>> General questions about Bioconductor packages should be addressed to
> >>>>> the
> >>>>> support forum https://support.bioconductor.org.
> >>>>>
> >>>>> Questions about Bioconductor development (such as this) should be
> >>>>>
> >>>> addressed
> >>>>
> >>>>> to the bioc-devel mailing list (subscription required)
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel.
> >>>>>
> >>>>> I have cc'd the bioc-devel mailing list; I hope that is ok.
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Regards,
> >>>> Peng
> >>>>
> >>>
> >> ______________________________________________________________________
> >> The information in this email is confidential and inte...{{dropped:20}}
> >>
> >
> > _______________________________________________
> > Bioc-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list