[Bioc-devel] depends on packages providing classes

Thu Oct 30 03:50:47 CET 2014

On Wed, Oct 29, 2014 at 1:07 PM, Vincent Carey
<stvjc at channing.harvard.edu> wrote:
> On Wed, Oct 29, 2014 at 2:15 PM, Hervé Pagès <hpages at fredhutch.org> wrote:
>
>> Hi,
>>
>> On 10/28/2014 08:51 PM, Vincent Carey wrote:
>>
>>>
>>>
>>> On Tue, Oct 28, 2014 at 5:48 PM, Hervé Pagès <hpages at fredhutch.org
>>> <mailto:hpages at fredhutch.org>> wrote:
>>>
>>>
>>>
>>>     On 10/28/2014 12:42 PM, Vincent Carey wrote:
>>>
>>>
>>>
>>>         On Tue, Oct 28, 2014 at 2:29 PM, Hervé Pagès
>>>         <hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>>         <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>>
>>> wrote:
>>>
>>>              Hi,
>>>
>>>              On 10/28/2014 08:48 AM, Vincent Carey wrote:
>>>
>>>                  On Tue, Oct 28, 2014 at 11:23 AM, Kasper Daniel Hansen <
>>>         kasperdanielhansen at gmail.com <mailto:kasperdanielhansen at gmail.com
>>> >
>>>                  <mailto:kasperdanielhansen at __gmail.com
>>>
>>>         <mailto:kasperdanielhansen at gmail.com>>> wrote:
>>>
>>>                      Well, first I want to make sure that there is not
>>>         something
>>>                      special
>>>                      regarding S4 methods and classes. I have a feeling
>>>         that they
>>>                      are a special
>>>                      case.
>>>
>>>                      Second, while I agree with Jim's general opinion,
>>>         it is a
>>>                      little bit
>>>                      different when I have return objects which are
>>>         defined in
>>>                      other packages.
>>>                      If I don't depend on this other package, the user
>>>         is hosed
>>>                      wrt. the return
>>>                      object, unless I manually export all classes from
>>>         this other
>>>
>>>
>>>                  In what sense?  If you return an instance of GRanges,
>>>         certain
>>>                  things can be
>>>                  done
>>>                  even if GenomicRanges is not attached.
>>>
>>>
>>>              Yes certain things maybe, but it's hard to predict which
>>> ones.
>>>
>>>                    You can get values of slots, for
>>>                  example.
>>>
>>>                  With the following little package
>>>
>>>                  %vjcair> cat foo/NAMESPACE
>>>
>>>                  importFrom(IRanges, IRanges)
>>>
>>>                  importClassesFrom(____GenomicRanges, GRanges)
>>>
>>>                  importFrom(GenomicRanges, GRanges)
>>>
>>>                  export(myfun)
>>>
>>>
>>>
>>>                  %vjcair> cat foo/DESCRIPTION
>>>
>>>                  Package: foo
>>>
>>>                  Title: foo
>>>
>>>                  Version: 0.0.0
>>>
>>>                  Author: VJ Carey <stvjc at channing.harvard.edu
>>>         <mailto:stvjc at channing.harvard.edu>
>>>                  <mailto:stvjc at channing.__harvard.edu
>>>         <mailto:stvjc at channing.harvard.edu>>>
>>>
>>>                  Description:
>>>
>>>                  Suggests:
>>>
>>>                  Depends:
>>>
>>>                  Imports: GenomicRanges
>>>
>>>                  Maintainer: VJ Carey <stvjc at channing.harvard.edu
>>>         <mailto:stvjc at channing.harvard.edu>
>>>                  <mailto:stvjc at channing.__harvard.edu
>>>
>>>         <mailto:stvjc at channing.harvard.edu>>>
>>>
>>>
>>>                  License: Private
>>>
>>>                  LazyLoad: yes
>>>
>>>
>>>
>>>                  %vjcair> cat foo/R/*
>>>
>>>                  myfun = function(seqnames="1", ranges=IRanges(1,2), ...)
>>>
>>>                       GRanges(seqnames=seqnames, ranges=ranges, ...)
>>>
>>>
>>>                  The following works:
>>>
>>>
>>>                      library(foo)
>>>
>>>
>>>                      x = myfun()
>>>
>>>
>>>                      x
>>>
>>>
>>>                  GRanges object with 1 range and 0 metadata columns:
>>>
>>>                          seqnames    ranges strand
>>>
>>>                             <Rle> <IRanges>  <Rle>
>>>
>>>                      [1]        1    [1, 2]      *
>>>
>>>                      -------
>>>
>>>                      seqinfo: 1 sequence from an unspecified genome; no
>>>         seqlengths
>>>
>>>
>>>                  So the show method works, even though I have not
>>>         touched it.  (I
>>>                  did not
>>>
>>>                  expect it to work, in fact.)
>>>
>>>
>>>              Exactly. Let's call it luck ;-)
>>>
>>>                    Additionally, I can get access to slots.
>>>
>>>
>>>              The end user should never try to access slots directly but
>>>         use getters
>>>              and setters instead. And most getters and setters for
>>>         GRanges objects
>>>              are defined and documented in the GenomicRanges package.
>>>         Those that are
>>>              not are defined in packages that GenomicRanges depends on.
>>>
>>>                    But
>>>                  ranges()
>>>
>>>                  fails.  If I, the user, want to use it, I need to
>>>         arrange for that.
>>>
>>>
>>>              IMO if your package returns a GRanges object to the user,
>>>         then the user
>>>              should be able to access the man page for GRanges objects
>>>         with ?GRanges.
>>>
>>>
>>>         Oddly enough, that seems to be incorrect.  I added a man page to
>>> foo
>>>         that has
>>>         a \link[GenomicRanges]{GRanges-__class}.  I ran help.start and
>>>         the cross
>>>         reference
>>>         from my man page succeeds.  Furthermore with the sessionInfo
>>>         below, ?GRanges
>>>         succeeds at the CLI.
>>>
>>>
>>>     Did you try to run example(GRanges)? I'm not sure that will work.
>>>
>>>
>>> Correct.  Cursory look at source shows that help() uses loadedNamespaces()
>>> to find the help file.  example() could probably do likewise.
>>>
>>
>> Sounds reasonable. So it seems that some recent changes in R make
>> it possible to access the man page and examples for stuff that
>> is imported but not attached. This is an important shift in paradigm
>> to me. In the past I would just rely on the simple notion that
>> what I can access with ? or example() reflects what's in my
>> search pass. Now if I do ?DNAStringSet and it succeeds, I can't
>> assume DNAStringSet() is in my search path anymore. And if I
>> want to copy/paste a few commands from the examples in order to
>> try them in my session, they might fail because the package where
>> these examples belong is not necessarily attached.
>> I wonder whether that means we should now start every example
>> section with library(foo)? The rationale for not doing it so far
>>
>
> I think that would be excessive.  You are correct that some code will
> not run, and the user will have to decide what to do.  We have access to
> core members.  example() could be tuned to check for attachment of the
> package hosting the page and fail if the host package is not attached, with
> a hint as to how to proceed.  For cutting and pasting, caveat emptor.

That's already taken care of; example() already attaches the package,
cf. https://github.com/wch/r-source/blob/trunk/src/library/utils/R/example.R#L53-L54

EXAMPLE:

$ R --vanilla

R Under development (unstable) (2014-10-26 r66879) -- "Unsuffered Consequences"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
[...]

> search()
[1] ".GlobalEnv"        "package:stats"     "package:graphics"
[4] "package:grDevices" "package:utils"     "package:datasets"
[7] "package:methods"   "Autoloads"         "package:base"
> example("md5sum", package="tools")

md5sum> as.vector(md5sum(dir(R.home(), pattern = "^COPY", full.names = TRUE)))
[1] "0cce1e42ef3fb133940946534fcf8896"

> search()
 [1] ".GlobalEnv"        "package:tools"     "package:stats"
 [4] "package:graphics"  "package:grDevices" "package:utils"
 [7] "package:datasets"  "package:methods"   "Autoloads"
[10] "package:base"

>
>
>> was that if you can access the man page with ? then that means
>> the package is already attached.

...but maybe it wouldn't hurt to be explicit and add a library("...")
at the top, just as we do everywhere else including vignettes and
package test scripts.

/Henrik

>>
>> As a side note the decision to extend the scope of ? to attached
>> packages and not to all installed packages feels arbitrary to me.
>> Going all the way would make ? even more useful and would be
>> consistent with what I see when navigating the documentation in
>> a browser. So when the user wants to call DNAStringSet() but
>> doesn't remember where it lives, ?DNAStringSet would be a quick
>> and easy way to know, and this whether the package is loaded via
>> a namespace or not.
>>
>
> I think this is a reasonable objective.
>
>
>>
>> Anyway, to get back to the original topic, IMO this change in R
>> still doesn't justify changing the Depends vs Imports game. I see
>> at least 3 strong cases for using 'Depends: A' instead of 'Imports: A'
>> in package B:
>>   (1) B defines (and exports) a class that extend a class defined in A.
>>
>
> In my view there is a risk of needless namespace pollution in this case.
> Depends seems extreme, other things being equal.  Better to let the user
> determine in real time whether this should occur.  It seems to me that
> particularly
> when packages have lots of complicated interrelationships, it is best to
> have the
> developers manage symbols internally to the code, reducing as much as
> possible
> the impact on the user the user environment.  Minimizing the use of Depends
> seems
> consistent with this.
>
>
>>   (2) B defines (and exports) methods for a generic defined in A.
>>   (3) B defines (and exports) functions or methods that return
>>       objects of a class defined in package A.
>>
>> 'Imports: A' should be reserved to situations where A is used
>> internally by B and in a way that is B's internal business only
>> and none of the end-user's business. A typical example is the
>> internal use of RSQLite and biomaRt in GenomicFeatures.
>>
>
> I'm sympathetic to this view but would rather be out of the business of
> figuring out what the end-user's business is apart from using and
> getting value from the functions defined in the package that I contributed.
>
> Leaving the attachments up to the user is one way.
>
>
>>
>> I can see the attractiveness of trying to minimize what gets attached
>> to the user's session but I'm also concerned that trying to go to far
>> in that direction ultimately has no real benefit and can hurt the
>> user-friendliness of the software.
>>
>
> We should try to assemble data on this concern.  I don't know how to do it.
>
>
>>
>> H.
>>
>>
>>>
>>>     For example after I do library(rtracklayer), I can indeed do
>>>     ?DNAStringSet at the command line (I'm surprised this works), but
>>>     then example(DNAStringSet) fails:
>>>
>>>        > example(DNAStringSet)
>>>        Warning message:
>>>        In example(DNAStringSet) : no help found for ‘DNAStringSet’
>>>
>>>     I'm also surprised this is just a warning but that's another story...
>>>
>>>     H.
>>>
>>>           I am not trying to defend the NOTE but the
>>>         principle of minimizing
>>>         Depends declarations needs to be considered critically, and I am
>>>         just
>>>         exploring the space.
>>>
>>>           > ?GRanges  # it worked as usual in the tty
>>>
>>>           > sessionInfo()
>>>
>>>         R version 3.1.1 (2014-07-10)
>>>
>>>         Platform: x86_64-apple-darwin13.1.0 (64-bit)
>>>
>>>
>>>         locale:
>>>
>>>         [1]
>>>         en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>>>
>>>
>>>
>>>         attached base packages:
>>>
>>>         [1] stats     graphics  grDevices datasets  utils     tools
>>>           methods
>>>
>>>         [8] base
>>>
>>>
>>>         other attached packages:
>>>
>>>         [1] foo_0.0.0            rmarkdown_0.3.8      knitr_1.6
>>>
>>>         [4] weaver_1.31.0        codetools_0.2-9      digest_0.6.4
>>>
>>>         [7] BiocInstaller_1.16.0
>>>
>>>
>>>         loaded via a namespace (and not attached):
>>>
>>>            [1] BiocGenerics_0.11.5   evaluate_0.5.5        formatR_1.0
>>>
>>>            [4] GenomeInfoDb_1.1.26   GenomicRanges_1.17.48 htmltools_0.2.6
>>>
>>>            [7] IRanges_1.99.32       parallel_3.1.1        S4Vectors_0.2.8
>>>
>>>         [10] stats4_3.1.1          stringr_0.6.2         XVector_0.5.8
>>>
>>>              And that works only if the GenomicRanges package is
>>>         attached. Attaching
>>>              GenomicRanges will also attach other packages that
>>>         GenomicRanges depends
>>>              on where some GRanges accessors might be defined and
>>>         documented (e.g.
>>>              metadata()).
>>>
>>>
>>>
>>>                  In some cases you'll decide you want the user to have a
>>>         full
>>>                  complement of
>>>
>>>                  methods for your package to function meaningfully.  For
>>>         example,
>>>                  I am
>>>                  considering
>>>
>>>                  using dplyr idioms to work with data structures in a
>>>         package,
>>>                  and it seems
>>>                  I should
>>>
>>>                  just depend on dplyr rather than pick out and document
>>>         which
>>>                  things I want
>>>                  to expose.  But that
>>>
>>>                  may still be an undesirable design.
>>>
>>>
>>>                      package, like
>>>                          importClassesFrom("____GenomicRanges",
>>> "GRanges")
>>>
>>>
>>>                          exportClasses("GRanges")
>>>                      Surely that is not intended.
>>>
>>>                      It is important that my package works without being
>>>         attached
>>>                      to the search
>>>                      path and I do this by carefully importing what I
>>>         need, ie.
>>>                      my code does not
>>>                      require that my dependencies are attached to the
>>> search
>>>                      path.  But the end
>>>                      user will be hosed without it.
>>>
>>>
>>>              Yes s/he will. Fortunately when your package namespace gets
>>>         loaded by
>>>              another package, then nothing gets attached to the search
>>>         path, even if
>>>              your package depends (instead of imports) on other
>>>         packages. So using
>>>              Depends instead of Imports for your own dependencies won't
>>>         make any
>>>              difference in that respect, which is good.
>>>
>>>
>>>                      My impression is that the NOTE in R CMD check was
>>>         written by
>>>                      someone who
>>>                      did not anticipate large-scale use and re-use of
>>>         classes and
>>>                      methods across
>>>                      many packages.
>>>
>>>
>>>              That's my impression too.
>>>
>>>              Cheers,
>>>              H.
>>>
>>>
>>>                      Best,
>>>                      Kasper
>>>
>>>
>>>                      On Tue, Oct 28, 2014 at 11:14 AM, James W. MacDonald
>>>                      <jmacdon at uw.edu <mailto:jmacdon at uw.edu>
>>>         <mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>>>
>>>                      wrote:
>>>
>>>                          I agree with Vince. It's your job as a package
>>>         developer
>>>                          to make
>>>                          available to your package all the functions
>>>         necessary
>>>                          for the package to
>>>                          work. But I am not sure it is your job to load
>>>         all the
>>>                          packages that your
>>>                          end user might need.
>>>
>>>                          Best,
>>>
>>>                          Jim
>>>
>>>
>>>
>>>                          On Tue, Oct 28, 2014 at 11:04 AM, Vincent Carey <
>>>         stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>
>>>                          <mailto:stvjc at channing.__harvard.edu
>>>         <mailto:stvjc at channing.harvard.edu>>> wrote:
>>>
>>>                              On Tue, Oct 28, 2014 at 10:19 AM, Kasper
>>>         Daniel Hansen <
>>>         kasperdanielhansen at gmail.com <mailto:kasperdanielhansen at gmail.com
>>> >
>>>                              <mailto:kasperdanielhansen at __gmail.com
>>>         <mailto:kasperdanielhansen at gmail.com>>> wrote:
>>>
>>>                                  What is the current best paradigm for
>>>         using all
>>>                                  the classes in
>>>
>>>         S4Vectors/GenomeInfoDb/____GenomicRanges/IRanges
>>>
>>>
>>>
>>>                                  I obviously import methods and classes
>>>         from the
>>>                                  relevant packages.
>>>
>>>                                  But shouldn't I depend on these packages
>>> as
>>>                                  well?  Since I basically
>>>
>>>                              want
>>>
>>>                                  the user to have this functionality at
>>> the
>>>                                  command line? That is what
>>>
>>>                              I do
>>>
>>>                                  now.
>>>
>>>
>>>                              I've wondered about this as well.  It seems
>>> the
>>>                              principle is that the
>>>                              user
>>>                              should
>>>                              take care of attaching additional packages
>>> when
>>>                              needed.  It might be
>>>                              appropriate
>>>                              to give a hint in the package startup
>>>         message, if
>>>                              having some other
>>>                              package
>>>                              attached
>>>                              would typically be of great utility.
>>>
>>>                              Given your list above, I would think that
>>>         depending
>>>                              on GenomicRanges
>>>                              would
>>>                              often
>>>                              be sufficient, and IRanges/S4Vectors would
>>> not
>>>                              require dependency
>>>                              assertion.  I would
>>>                              think that GenomeInfoDb should be a voluntary
>>>                              attachment for a specific
>>>                              session.
>>>
>>>                              These are just my guesses -- I doubt there
>>>         will be
>>>                              complete consensus,
>>>                              but
>>>                              I have
>>>                              started to think very critically about using
>>>                              Depends, and I think it is
>>>                              better when its
>>>                              use is minimized.
>>>
>>>
>>>                                  That of course leads to the R CMD check
>>>         NOTE on
>>>                                  depending on too many
>>>                                  packages.... I guess I should ignore
>>>         that one.
>>>
>>>                                  Best,
>>>                                  Kasper
>>>
>>>                                            [[alternative HTML version
>>>         deleted]]
>>>
>>>
>>>         ___________________________________________________
>>>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>                                  <mailto:Bioc-devel at r-project.__org
>>>         <mailto:Bioc-devel at r-project.org>> mailing list
>>>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>>>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>>>
>>>
>>>                                        [[alternative HTML version
>>> deleted]]
>>>
>>>
>>>         ___________________________________________________
>>>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>                              <mailto:Bioc-devel at r-project.__org
>>>         <mailto:Bioc-devel at r-project.org>> mailing list
>>>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>>>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>>>
>>>
>>>
>>>
>>>                          --
>>>                          James W. MacDonald, M.S.
>>>                          Biostatistician
>>>                          University of Washington
>>>                          Environmental and Occupational Health Sciences
>>>                          4225 Roosevelt Way NE, # 100
>>>                          Seattle WA 98105-6099
>>>
>>>
>>>
>>>
>>>                           [[alternative HTML version deleted]]
>>>
>>>                  ___________________________________________________
>>>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>>         <mailto:Bioc-devel at r-project.__org
>>>         <mailto:Bioc-devel at r-project.org>>
>>>                  mailing list
>>>         https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>>>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>                  <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>>>
>>>
>>>              --
>>>              Hervé Pagès
>>>
>>>              Program in Computational Biology
>>>              Division of Public Health Sciences
>>>              Fred Hutchinson Cancer Research Center
>>>              1100 Fairview Ave. N, M1-B514
>>>              P.O. Box 19024
>>>              Seattle, WA 98109-1024
>>>
>>>              E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>>         <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>
>>>
>>>
>>>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>>         <tel:%28206%29%20667-5791>
>>>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>         <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>>>     --
>>>     Hervé Pagès
>>>
>>>     Program in Computational Biology
>>>     Division of Public Health Sciences
>>>     Fred Hutchinson Cancer Research Center
>>>     1100 Fairview Ave. N, M1-B514
>>>     P.O. Box 19024
>>>     Seattle, WA 98109-1024
>>>
>>>     E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel