[Bioc-devel] depends on packages providing classes

Hervé Pagès hpages at fredhutch.org
Thu Oct 30 04:00:25 CET 2014


On 10/29/2014 01:07 PM, Vincent Carey wrote:
>
>
> On Wed, Oct 29, 2014 at 2:15 PM, Hervé Pagès <hpages at fredhutch.org
> <mailto:hpages at fredhutch.org>> wrote:
>
>     Hi,
>
>     On 10/28/2014 08:51 PM, Vincent Carey wrote:
>
>
>
>         On Tue, Oct 28, 2014 at 5:48 PM, Hervé Pagès
>         <hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>         <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>> wrote:
>
>
>
>              On 10/28/2014 12:42 PM, Vincent Carey wrote:
>
>
>
>                  On Tue, Oct 28, 2014 at 2:29 PM, Hervé Pagès
>                  <hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>         <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>
>                  <mailto:hpages at fredhutch.org
>         <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org
>         <mailto:hpages at fredhutch.org>>>__> wrote:
>
>                       Hi,
>
>                       On 10/28/2014 08:48 AM, Vincent Carey wrote:
>
>                           On Tue, Oct 28, 2014 at 11:23 AM, Kasper
>         Daniel Hansen <
>         kasperdanielhansen at gmail.com
>         <mailto:kasperdanielhansen at gmail.com>
>         <mailto:kasperdanielhansen at __gmail.com
>         <mailto:kasperdanielhansen at gmail.com>>
>                           <mailto:kasperdanielhansen@
>         <mailto:kasperdanielhansen@>__g__mail.com <http://gmail.com>
>
>                  <mailto:kasperdanielhansen at __gmail.com
>         <mailto:kasperdanielhansen at gmail.com>>>> wrote:
>
>                               Well, first I want to make sure that there
>         is not
>                  something
>                               special
>                               regarding S4 methods and classes. I have a
>         feeling
>                  that they
>                               are a special
>                               case.
>
>                               Second, while I agree with Jim's general
>         opinion,
>                  it is a
>                               little bit
>                               different when I have return objects which are
>                  defined in
>                               other packages.
>                               If I don't depend on this other package,
>         the user
>                  is hosed
>                               wrt. the return
>                               object, unless I manually export all
>         classes from
>                  this other
>
>
>                           In what sense?  If you return an instance of
>         GRanges,
>                  certain
>                           things can be
>                           done
>                           even if GenomicRanges is not attached.
>
>
>                       Yes certain things maybe, but it's hard to predict
>         which ones.
>
>                             You can get values of slots, for
>                           example.
>
>                           With the following little package
>
>                           %vjcair> cat foo/NAMESPACE
>
>                           importFrom(IRanges, IRanges)
>
>                           importClassesFrom(______GenomicRanges, GRanges)
>
>                           importFrom(GenomicRanges, GRanges)
>
>                           export(myfun)
>
>
>
>                           %vjcair> cat foo/DESCRIPTION
>
>                           Package: foo
>
>                           Title: foo
>
>                           Version: 0.0.0
>
>                           Author: VJ Carey <stvjc at channing.harvard.edu
>         <mailto:stvjc at channing.harvard.edu>
>                  <mailto:stvjc at channing.__harvard.edu
>         <mailto:stvjc at channing.harvard.edu>>
>                           <mailto:stvjc at channing.
>         <mailto:stvjc at channing.>__harva__rd.edu <http://harvard.edu>
>                  <mailto:stvjc at channing.__harvard.edu
>         <mailto:stvjc at channing.harvard.edu>>>>
>
>                           Description:
>
>                           Suggests:
>
>                           Depends:
>
>                           Imports: GenomicRanges
>
>                           Maintainer: VJ Carey
>         <stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>
>                  <mailto:stvjc at channing.__harvard.edu
>         <mailto:stvjc at channing.harvard.edu>>
>                           <mailto:stvjc at channing.
>         <mailto:stvjc at channing.>__harva__rd.edu <http://harvard.edu>
>
>                  <mailto:stvjc at channing.__harvard.edu
>         <mailto:stvjc at channing.harvard.edu>>>>
>
>
>                           License: Private
>
>                           LazyLoad: yes
>
>
>
>                           %vjcair> cat foo/R/*
>
>                           myfun = function(seqnames="1",
>         ranges=IRanges(1,2), ...)
>
>                                GRanges(seqnames=seqnames, ranges=ranges,
>         ...)
>
>
>                           The following works:
>
>
>                               library(foo)
>
>
>                               x = myfun()
>
>
>                               x
>
>
>                           GRanges object with 1 range and 0 metadata
>         columns:
>
>                                   seqnames    ranges strand
>
>                                      <Rle> <IRanges>  <Rle>
>
>                               [1]        1    [1, 2]      *
>
>                               -------
>
>                               seqinfo: 1 sequence from an unspecified
>         genome; no
>                  seqlengths
>
>
>                           So the show method works, even though I have not
>                  touched it.  (I
>                           did not
>
>                           expect it to work, in fact.)
>
>
>                       Exactly. Let's call it luck ;-)
>
>                             Additionally, I can get access to slots.
>
>
>                       The end user should never try to access slots
>         directly but
>                  use getters
>                       and setters instead. And most getters and setters for
>                  GRanges objects
>                       are defined and documented in the GenomicRanges
>         package.
>                  Those that are
>                       not are defined in packages that GenomicRanges
>         depends on.
>
>                             But
>                           ranges()
>
>                           fails.  If I, the user, want to use it, I need to
>                  arrange for that.
>
>
>                       IMO if your package returns a GRanges object to
>         the user,
>                  then the user
>                       should be able to access the man page for GRanges
>         objects
>                  with ?GRanges.
>
>
>                  Oddly enough, that seems to be incorrect.  I added a
>         man page to foo
>                  that has
>                  a \link[GenomicRanges]{GRanges-____class}.  I ran
>         help.start and
>                  the cross
>                  reference
>                  from my man page succeeds.  Furthermore with the
>         sessionInfo
>                  below, ?GRanges
>                  succeeds at the CLI.
>
>
>              Did you try to run example(GRanges)? I'm not sure that will
>         work.
>
>
>         Correct.  Cursory look at source shows that help() uses
>         loadedNamespaces()
>         to find the help file.  example() could probably do likewise.
>
>
>     Sounds reasonable. So it seems that some recent changes in R make
>     it possible to access the man page and examples for stuff that
>     is imported but not attached. This is an important shift in paradigm
>     to me. In the past I would just rely on the simple notion that
>     what I can access with ? or example() reflects what's in my
>     search pass. Now if I do ?DNAStringSet and it succeeds, I can't
>     assume DNAStringSet() is in my search path anymore. And if I
>     want to copy/paste a few commands from the examples in order to
>     try them in my session, they might fail because the package where
>     these examples belong is not necessarily attached.
>     I wonder whether that means we should now start every example
>     section with library(foo)? The rationale for not doing it so far
>
>
> I think that would be excessive.  You are correct that some code will
> not run, and the user will have to decide what to do.  We have access to
> core members.  example() could be tuned to check for attachment of the
> package hosting the page and fail if the host package is not attached, with
> a hint as to how to proceed.  For cutting and pasting, caveat emptor.
>
>     was that if you can access the man page with ? then that means
>     the package is already attached.
>
>     As a side note the decision to extend the scope of ? to attached
>     packages and not to all installed packages feels arbitrary to me.
>     Going all the way would make ? even more useful and would be
>     consistent with what I see when navigating the documentation in
>     a browser. So when the user wants to call DNAStringSet() but
>     doesn't remember where it lives, ?DNAStringSet would be a quick
>     and easy way to know, and this whether the package is loaded via
>     a namespace or not.
>
>
> I think this is a reasonable objective.
>
>
>     Anyway, to get back to the original topic, IMO this change in R
>     still doesn't justify changing the Depends vs Imports game. I see
>     at least 3 strong cases for using 'Depends: A' instead of 'Imports: A'
>     in package B:
>        (1) B defines (and exports) a class that extend a class defined in A.
>
>
> In my view there is a risk of needless namespace pollution in this case.
> Depends seems extreme, other things being equal.  Better to let the user
> determine in real time whether this should occur.  It seems to me that
> particularly
> when packages have lots of complicated interrelationships, it is best to
> have the
> developers manage symbols internally to the code, reducing as much as
> possible
> the impact on the user the user environment.  Minimizing the use of
> Depends seems
> consistent with this.
>
>        (2) B defines (and exports) methods for a generic defined in A.
>        (3) B defines (and exports) functions or methods that return
>            objects of a class defined in package A.
>
>     'Imports: A' should be reserved to situations where A is used
>     internally by B and in a way that is B's internal business only
>     and none of the end-user's business. A typical example is the
>     internal use of RSQLite and biomaRt in GenomicFeatures.
>
>
> I'm sympathetic to this view but would rather be out of the business of
> figuring out what the end-user's business is apart from using and
> getting value from the functions defined in the package that I contributed.
> Leaving the attachments up to the user is one way.
>
>
>     I can see the attractiveness of trying to minimize what gets attached
>     to the user's session but I'm also concerned that trying to go to far
>     in that direction ultimately has no real benefit and can hurt the
>     user-friendliness of the software.
>
>
> We should try to assemble data on this concern.  I don't know how to do it.

Well user-friendliness is hard to measure because it can be very
subjective. Personally I don't feel that my package B is the most
user-friendly if my functions return objects of a class defined
in package A and if A is not in Depends. If I know in advance that
my users will almost always need to attach A before they can do
anything with these objects, then I'd rather do that for them.

Note that it's different from trying to anticipate any possible
use of these objects by my users, which I agree is a business
I'd rather stay out.

H.

>
>
>     H.
>
>
>
>              For example after I do library(rtracklayer), I can indeed do
>              ?DNAStringSet at the command line (I'm surprised this
>         works), but
>              then example(DNAStringSet) fails:
>
>                 > example(DNAStringSet)
>                 Warning message:
>                 In example(DNAStringSet) : no help found for ‘DNAStringSet’
>
>              I'm also surprised this is just a warning but that's
>         another story...
>
>              H.
>
>                    I am not trying to defend the NOTE but the
>                  principle of minimizing
>                  Depends declarations needs to be considered critically,
>         and I am
>                  just
>                  exploring the space.
>
>                    > ?GRanges  # it worked as usual in the tty
>
>                    > sessionInfo()
>
>                  R version 3.1.1 (2014-07-10)
>
>                  Platform: x86_64-apple-darwin13.1.0 (64-bit)
>
>
>                  locale:
>
>                  [1]
>
>         en_US.UTF-8/en_US.UTF-8/en_US.____UTF-8/C/en_US.UTF-8/en_US.__UTF-__8
>
>
>
>                  attached base packages:
>
>                  [1] stats     graphics  grDevices datasets  utils     tools
>                    methods
>
>                  [8] base
>
>
>                  other attached packages:
>
>                  [1] foo_0.0.0            rmarkdown_0.3.8      knitr_1.6
>
>                  [4] weaver_1.31.0        codetools_0.2-9      digest_0.6.4
>
>                  [7] BiocInstaller_1.16.0
>
>
>                  loaded via a namespace (and not attached):
>
>                     [1] BiocGenerics_0.11.5   evaluate_0.5.5
>         formatR_1.0
>
>                     [4] GenomeInfoDb_1.1.26   GenomicRanges_1.17.48
>         htmltools_0.2.6
>
>                     [7] IRanges_1.99.32       parallel_3.1.1
>         S4Vectors_0.2.8
>
>                  [10] stats4_3.1.1          stringr_0.6.2
>           XVector_0.5.8
>
>                       And that works only if the GenomicRanges package is
>                  attached. Attaching
>                       GenomicRanges will also attach other packages that
>                  GenomicRanges depends
>                       on where some GRanges accessors might be defined and
>                  documented (e.g.
>                       metadata()).
>
>
>
>                           In some cases you'll decide you want the user
>         to have a
>                  full
>                           complement of
>
>                           methods for your package to function
>         meaningfully.  For
>                  example,
>                           I am
>                           considering
>
>                           using dplyr idioms to work with data
>         structures in a
>                  package,
>                           and it seems
>                           I should
>
>                           just depend on dplyr rather than pick out and
>         document
>                  which
>                           things I want
>                           to expose.  But that
>
>                           may still be an undesirable design.
>
>
>                               package, like
>
>           importClassesFrom("______GenomicRanges", "GRanges")
>
>
>                                   exportClasses("GRanges")
>                               Surely that is not intended.
>
>                               It is important that my package works
>         without being
>                  attached
>                               to the search
>                               path and I do this by carefully importing
>         what I
>                  need, ie.
>                               my code does not
>                               require that my dependencies are attached
>         to the search
>                               path.  But the end
>                               user will be hosed without it.
>
>
>                       Yes s/he will. Fortunately when your package
>         namespace gets
>                  loaded by
>                       another package, then nothing gets attached to the
>         search
>                  path, even if
>                       your package depends (instead of imports) on other
>                  packages. So using
>                       Depends instead of Imports for your own
>         dependencies won't
>                  make any
>                       difference in that respect, which is good.
>
>
>                               My impression is that the NOTE in R CMD
>         check was
>                  written by
>                               someone who
>                               did not anticipate large-scale use and
>         re-use of
>                  classes and
>                               methods across
>                               many packages.
>
>
>                       That's my impression too.
>
>                       Cheers,
>                       H.
>
>
>                               Best,
>                               Kasper
>
>
>                               On Tue, Oct 28, 2014 at 11:14 AM, James W.
>         MacDonald
>                               <jmacdon at uw.edu <mailto:jmacdon at uw.edu>
>         <mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>>
>                  <mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>
>         <mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>>>>
>                               wrote:
>
>                                   I agree with Vince. It's your job as a
>         package
>                  developer
>                                   to make
>                                   available to your package all the
>         functions
>                  necessary
>                                   for the package to
>                                   work. But I am not sure it is your job
>         to load
>                  all the
>                                   packages that your
>                                   end user might need.
>
>                                   Best,
>
>                                   Jim
>
>
>
>                                   On Tue, Oct 28, 2014 at 11:04 AM,
>         Vincent Carey <
>         stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>
>         <mailto:stvjc at channing.__harvard.edu
>         <mailto:stvjc at channing.harvard.edu>>
>                                   <mailto:stvjc at channing.
>         <mailto:stvjc at channing.>__harva__rd.edu <http://harvard.edu>
>                  <mailto:stvjc at channing.__harvard.edu
>         <mailto:stvjc at channing.harvard.edu>>>> wrote:
>
>                                       On Tue, Oct 28, 2014 at 10:19 AM,
>         Kasper
>                  Daniel Hansen <
>         kasperdanielhansen at gmail.com
>         <mailto:kasperdanielhansen at gmail.com>
>         <mailto:kasperdanielhansen at __gmail.com
>         <mailto:kasperdanielhansen at gmail.com>>
>                                       <mailto:kasperdanielhansen@
>         <mailto:kasperdanielhansen@>__g__mail.com <http://gmail.com>
>                  <mailto:kasperdanielhansen at __gmail.com
>         <mailto:kasperdanielhansen at gmail.com>>>> wrote:
>
>                                           What is the current best
>         paradigm for
>                  using all
>                                           the classes in
>
>                  S4Vectors/GenomeInfoDb/______GenomicRanges/IRanges
>
>
>
>                                           I obviously import methods and
>         classes
>                  from the
>                                           relevant packages.
>
>                                           But shouldn't I depend on
>         these packages as
>                                           well?  Since I basically
>
>                                       want
>
>                                           the user to have this
>         functionality at the
>                                           command line? That is what
>
>                                       I do
>
>                                           now.
>
>
>                                       I've wondered about this as well.
>         It seems the
>                                       principle is that the
>                                       user
>                                       should
>                                       take care of attaching additional
>         packages when
>                                       needed.  It might be
>                                       appropriate
>                                       to give a hint in the package startup
>                  message, if
>                                       having some other
>                                       package
>                                       attached
>                                       would typically be of great utility.
>
>                                       Given your list above, I would
>         think that
>                  depending
>                                       on GenomicRanges
>                                       would
>                                       often
>                                       be sufficient, and
>         IRanges/S4Vectors would not
>                                       require dependency
>                                       assertion.  I would
>                                       think that GenomeInfoDb should be
>         a voluntary
>                                       attachment for a specific
>                                       session.
>
>                                       These are just my guesses -- I
>         doubt there
>                  will be
>                                       complete consensus,
>                                       but
>                                       I have
>                                       started to think very critically
>         about using
>                                       Depends, and I think it is
>                                       better when its
>                                       use is minimized.
>
>
>                                           That of course leads to the R
>         CMD check
>                  NOTE on
>                                           depending on too many
>                                           packages.... I guess I should
>         ignore
>                  that one.
>
>                                           Best,
>                                           Kasper
>
>                                                     [[alternative HTML
>         version
>                  deleted]]
>
>
>                  _____________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                                           <mailto:Bioc-devel at r-project.
>         <mailto:Bioc-devel at r-project.>____org
>                  <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>> mailing list
>         https://stat.ethz.ch/mailman/______listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
>
>
>                                                 [[alternative HTML
>         version deleted]]
>
>
>                  _____________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                                       <mailto:Bioc-devel at r-project.
>         <mailto:Bioc-devel at r-project.>____org
>                  <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>> mailing list
>         https://stat.ethz.ch/mailman/______listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
>
>
>
>
>                                   --
>                                   James W. MacDonald, M.S.
>                                   Biostatistician
>                                   University of Washington
>                                   Environmental and Occupational Health
>         Sciences
>                                   4225 Roosevelt Way NE, # 100
>                                   Seattle WA 98105-6099
>
>
>
>
>                                    [[alternative HTML version deleted]]
>
>
>           _____________________________________________________
>         Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>         <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>
>                  <mailto:Bioc-devel at r-project.
>         <mailto:Bioc-devel at r-project.>____org
>                  <mailto:Bioc-devel at r-project.__org
>         <mailto:Bioc-devel at r-project.org>>>
>                           mailing list
>         https://stat.ethz.ch/mailman/______listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/____listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>>
>
>           <https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>                  <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>         <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>>
>
>
>                       --
>                       Hervé Pagès
>
>                       Program in Computational Biology
>                       Division of Public Health Sciences
>                       Fred Hutchinson Cancer Research Center
>                       1100 Fairview Ave. N, M1-B514
>                       P.O. Box 19024
>                       Seattle, WA 98109-1024
>
>                       E-mail: hpages at fredhutch.org
>         <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org
>         <mailto:hpages at fredhutch.org>>
>                  <mailto:hpages at fredhutch.org
>         <mailto:hpages at fredhutch.org> <mailto:hpages at fredhutch.org
>         <mailto:hpages at fredhutch.org>>>
>
>
>                       Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>                  <tel:%28206%29%20667-5791>
>                       Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>                  <tel:%28206%29%20667-1319>
>
>
>
>              --
>              Hervé Pagès
>
>              Program in Computational Biology
>              Division of Public Health Sciences
>              Fred Hutchinson Cancer Research Center
>              1100 Fairview Ave. N, M1-B514
>              P.O. Box 19024
>              Seattle, WA 98109-1024
>
>              E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>         <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>
>              Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>         <tel:%28206%29%20667-5791>
>              Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>         <tel:%28206%29%20667-1319>
>
>
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list