[Bioc-devel] depends on packages providing classes

Michael Lawrence lawrence.michael at gene.com
Tue Oct 28 22:50:18 CET 2014


This is a recent change to R; I think 3.1.1.

I think many of us (including myself) are (ab)using Depends to deploy a
convenient "context" for performing the sorts of work related to the
package. This includes the ability to construct inputs and manipulate
outputs. But in many ways that context is orthogonal to the package itself.
Really what we need is a way to define a context for e.g. chip-seq analysis
and distribute it to users. We could implement this using a "meta" package
(BiocChipseq), but Gabe Becker's switchr package is another possible
foundation. Currently, switchr does not actually load packages when
switching to a context, but we have plans for adding "profile" support, so
that a context runs a script on load. That script might attach packages,
set options, etc.






On Tue, Oct 28, 2014 at 12:42 PM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

> On Tue, Oct 28, 2014 at 2:29 PM, Hervé Pagès <hpages at fredhutch.org> wrote:
>
> > Hi,
> >
> > On 10/28/2014 08:48 AM, Vincent Carey wrote:
> >
> >> On Tue, Oct 28, 2014 at 11:23 AM, Kasper Daniel Hansen <
> >> kasperdanielhansen at gmail.com> wrote:
> >>
> >>  Well, first I want to make sure that there is not something special
> >>> regarding S4 methods and classes. I have a feeling that they are a
> >>> special
> >>> case.
> >>>
> >>> Second, while I agree with Jim's general opinion, it is a little bit
> >>> different when I have return objects which are defined in other
> packages.
> >>> If I don't depend on this other package, the user is hosed wrt. the
> >>> return
> >>> object, unless I manually export all classes from this other
> >>>
> >>>
> >> In what sense?  If you return an instance of GRanges, certain things can
> >> be
> >> done
> >> even if GenomicRanges is not attached.
> >>
> >
> > Yes certain things maybe, but it's hard to predict which ones.
> >
> >   You can get values of slots, for
> >> example.
> >>
> >> With the following little package
> >>
> >> %vjcair> cat foo/NAMESPACE
> >>
> >> importFrom(IRanges, IRanges)
> >>
> >> importClassesFrom(GenomicRanges, GRanges)
> >>
> >> importFrom(GenomicRanges, GRanges)
> >>
> >> export(myfun)
> >>
> >>
> >>
> >> %vjcair> cat foo/DESCRIPTION
> >>
> >> Package: foo
> >>
> >> Title: foo
> >>
> >> Version: 0.0.0
> >>
> >> Author: VJ Carey <stvjc at channing.harvard.edu>
> >>
> >> Description:
> >>
> >> Suggests:
> >>
> >> Depends:
> >>
> >> Imports: GenomicRanges
> >>
> >> Maintainer: VJ Carey <stvjc at channing.harvard.edu>
> >>
> >> License: Private
> >>
> >> LazyLoad: yes
> >>
> >>
> >>
> >> %vjcair> cat foo/R/*
> >>
> >> myfun = function(seqnames="1", ranges=IRanges(1,2), ...)
> >>
> >>     GRanges(seqnames=seqnames, ranges=ranges, ...)
> >>
> >>
> >> The following works:
> >>
> >>
> >>  library(foo)
> >>>
> >>
> >>  x = myfun()
> >>>
> >>
> >>  x
> >>>
> >>
> >> GRanges object with 1 range and 0 metadata columns:
> >>
> >>        seqnames    ranges strand
> >>
> >>           <Rle> <IRanges>  <Rle>
> >>
> >>    [1]        1    [1, 2]      *
> >>
> >>    -------
> >>
> >>    seqinfo: 1 sequence from an unspecified genome; no seqlengths
> >>
> >>
> >> So the show method works, even though I have not touched it.  (I did not
> >>
> >> expect it to work, in fact.)
> >>
> >
> > Exactly. Let's call it luck ;-)
> >
> >   Additionally, I can get access to slots.
> >>
> >
> > The end user should never try to access slots directly but use getters
> > and setters instead. And most getters and setters for GRanges objects
> > are defined and documented in the GenomicRanges package. Those that are
> > not are defined in packages that GenomicRanges depends on.
> >
> >   But
> >> ranges()
> >>
> >> fails.  If I, the user, want to use it, I need to arrange for that.
> >>
> >
> > IMO if your package returns a GRanges object to the user, then the user
> > should be able to access the man page for GRanges objects with ?GRanges.
> >
>
> Oddly enough, that seems to be incorrect.  I added a man page to foo that
> has
> a \link[GenomicRanges]{GRanges-class}.  I ran help.start and the cross
> reference
> from my man page succeeds.  Furthermore with the sessionInfo below,
> ?GRanges
> succeeds at the CLI.  I am not trying to defend the NOTE but the principle
> of minimizing
> Depends declarations needs to be considered critically, and I am just
> exploring the space.
>
> > ?GRanges  # it worked as usual in the tty
>
> > sessionInfo()
>
> R version 3.1.1 (2014-07-10)
>
> Platform: x86_64-apple-darwin13.1.0 (64-bit)
>
>
> locale:
>
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
>
> attached base packages:
>
> [1] stats     graphics  grDevices datasets  utils     tools     methods
>
> [8] base
>
>
> other attached packages:
>
> [1] foo_0.0.0            rmarkdown_0.3.8      knitr_1.6
>
> [4] weaver_1.31.0        codetools_0.2-9      digest_0.6.4
>
> [7] BiocInstaller_1.16.0
>
>
> loaded via a namespace (and not attached):
>
>  [1] BiocGenerics_0.11.5   evaluate_0.5.5        formatR_1.0
>
>  [4] GenomeInfoDb_1.1.26   GenomicRanges_1.17.48 htmltools_0.2.6
>
>  [7] IRanges_1.99.32       parallel_3.1.1        S4Vectors_0.2.8
>
> [10] stats4_3.1.1          stringr_0.6.2         XVector_0.5.8
>
>
> > And that works only if the GenomicRanges package is attached. Attaching
> > GenomicRanges will also attach other packages that GenomicRanges depends
> > on where some GRanges accessors might be defined and documented (e.g.
> > metadata()).
> >
> >
> >>
> >> In some cases you'll decide you want the user to have a full complement
> of
> >>
> >> methods for your package to function meaningfully.  For example, I am
> >> considering
> >>
> >> using dplyr idioms to work with data structures in a package, and it
> seems
> >> I should
> >>
> >> just depend on dplyr rather than pick out and document which things I
> want
> >> to expose.  But that
> >>
> >> may still be an undesirable design.
> >>
> >>
> >>  package, like
> >>>    importClassesFrom("GenomicRanges", "GRanges")
> >>>    exportClasses("GRanges")
> >>> Surely that is not intended.
> >>>
> >>> It is important that my package works without being attached to the
> >>> search
> >>> path and I do this by carefully importing what I need, ie. my code does
> >>> not
> >>> require that my dependencies are attached to the search path.  But the
> >>> end
> >>> user will be hosed without it.
> >>>
> >>
> > Yes s/he will. Fortunately when your package namespace gets loaded by
> > another package, then nothing gets attached to the search path, even if
> > your package depends (instead of imports) on other packages. So using
> > Depends instead of Imports for your own dependencies won't make any
> > difference in that respect, which is good.
> >
> >
> >>> My impression is that the NOTE in R CMD check was written by someone
> who
> >>> did not anticipate large-scale use and re-use of classes and methods
> >>> across
> >>> many packages.
> >>>
> >>
> > That's my impression too.
> >
> > Cheers,
> > H.
> >
> >
> >>> Best,
> >>> Kasper
> >>>
> >>>
> >>> On Tue, Oct 28, 2014 at 11:14 AM, James W. MacDonald <jmacdon at uw.edu>
> >>> wrote:
> >>>
> >>>  I agree with Vince. It's your job as a package developer to make
> >>>> available to your package all the functions necessary for the package
> to
> >>>> work. But I am not sure it is your job to load all the packages that
> >>>> your
> >>>> end user might need.
> >>>>
> >>>> Best,
> >>>>
> >>>> Jim
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Oct 28, 2014 at 11:04 AM, Vincent Carey <
> >>>> stvjc at channing.harvard.edu> wrote:
> >>>>
> >>>>  On Tue, Oct 28, 2014 at 10:19 AM, Kasper Daniel Hansen <
> >>>>> kasperdanielhansen at gmail.com> wrote:
> >>>>>
> >>>>>  What is the current best paradigm for using all the classes in
> >>>>>> S4Vectors/GenomeInfoDb/GenomicRanges/IRanges
> >>>>>>
> >>>>>> I obviously import methods and classes from the relevant packages.
> >>>>>>
> >>>>>> But shouldn't I depend on these packages as well?  Since I basically
> >>>>>>
> >>>>> want
> >>>>>
> >>>>>> the user to have this functionality at the command line? That is
> what
> >>>>>>
> >>>>> I do
> >>>>>
> >>>>>> now.
> >>>>>>
> >>>>>>
> >>>>>>  I've wondered about this as well.  It seems the principle is that
> the
> >>>>> user
> >>>>> should
> >>>>> take care of attaching additional packages when needed.  It might be
> >>>>> appropriate
> >>>>> to give a hint in the package startup message, if having some other
> >>>>> package
> >>>>> attached
> >>>>> would typically be of great utility.
> >>>>>
> >>>>> Given your list above, I would think that depending on GenomicRanges
> >>>>> would
> >>>>> often
> >>>>> be sufficient, and IRanges/S4Vectors would not require dependency
> >>>>> assertion.  I would
> >>>>> think that GenomeInfoDb should be a voluntary attachment for a
> specific
> >>>>> session.
> >>>>>
> >>>>> These are just my guesses -- I doubt there will be complete
> consensus,
> >>>>> but
> >>>>> I have
> >>>>> started to think very critically about using Depends, and I think it
> is
> >>>>> better when its
> >>>>> use is minimized.
> >>>>>
> >>>>>
> >>>>>  That of course leads to the R CMD check NOTE on depending on too
> many
> >>>>>> packages.... I guess I should ignore that one.
> >>>>>>
> >>>>>> Best,
> >>>>>> Kasper
> >>>>>>
> >>>>>>          [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioc-devel at r-project.org mailing list
> >>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>>
> >>>>>>
> >>>>>          [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioc-devel at r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> James W. MacDonald, M.S.
> >>>> Biostatistician
> >>>> University of Washington
> >>>> Environmental and Occupational Health Sciences
> >>>> 4225 Roosevelt Way NE, # 100
> >>>> Seattle WA 98105-6099
> >>>>
> >>>>
> >>>
> >>>
> >>         [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >>
> >>
> > --
> > Hervé Pagès
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M1-B514
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fredhutch.org
> >
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list