[Bioc-devel] depends on packages providing classes
Henrik Bengtsson
hb at biostat.ucsf.edu
Thu Oct 30 03:50:47 CET 2014
On Wed, Oct 29, 2014 at 1:07 PM, Vincent Carey
<stvjc at channing.harvard.edu> wrote:
> On Wed, Oct 29, 2014 at 2:15 PM, Hervé Pagès <hpages at fredhutch.org> wrote:
>
>> Hi,
>>
>> On 10/28/2014 08:51 PM, Vincent Carey wrote:
>>
>>>
>>>
>>> On Tue, Oct 28, 2014 at 5:48 PM, Hervé Pagès <hpages at fredhutch.org
>>> <mailto:hpages at fredhutch.org>> wrote:
>>>
>>>
>>>
>>> On 10/28/2014 12:42 PM, Vincent Carey wrote:
>>>
>>>
>>>
>>> On Tue, Oct 28, 2014 at 2:29 PM, Hervé Pagès
>>> <hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>> <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>>
>>> wrote:
>>>
>>> Hi,
>>>
>>> On 10/28/2014 08:48 AM, Vincent Carey wrote:
>>>
>>> On Tue, Oct 28, 2014 at 11:23 AM, Kasper Daniel Hansen <
>>> kasperdanielhansen at gmail.com <mailto:kasperdanielhansen at gmail.com
>>> >
>>> <mailto:kasperdanielhansen at __gmail.com
>>>
>>> <mailto:kasperdanielhansen at gmail.com>>> wrote:
>>>
>>> Well, first I want to make sure that there is not
>>> something
>>> special
>>> regarding S4 methods and classes. I have a feeling
>>> that they
>>> are a special
>>> case.
>>>
>>> Second, while I agree with Jim's general opinion,
>>> it is a
>>> little bit
>>> different when I have return objects which are
>>> defined in
>>> other packages.
>>> If I don't depend on this other package, the user
>>> is hosed
>>> wrt. the return
>>> object, unless I manually export all classes from
>>> this other
>>>
>>>
>>> In what sense? If you return an instance of GRanges,
>>> certain
>>> things can be
>>> done
>>> even if GenomicRanges is not attached.
>>>
>>>
>>> Yes certain things maybe, but it's hard to predict which
>>> ones.
>>>
>>> You can get values of slots, for
>>> example.
>>>
>>> With the following little package
>>>
>>> %vjcair> cat foo/NAMESPACE
>>>
>>> importFrom(IRanges, IRanges)
>>>
>>> importClassesFrom(____GenomicRanges, GRanges)
>>>
>>> importFrom(GenomicRanges, GRanges)
>>>
>>> export(myfun)
>>>
>>>
>>>
>>> %vjcair> cat foo/DESCRIPTION
>>>
>>> Package: foo
>>>
>>> Title: foo
>>>
>>> Version: 0.0.0
>>>
>>> Author: VJ Carey <stvjc at channing.harvard.edu
>>> <mailto:stvjc at channing.harvard.edu>
>>> <mailto:stvjc at channing.__harvard.edu
>>> <mailto:stvjc at channing.harvard.edu>>>
>>>
>>> Description:
>>>
>>> Suggests:
>>>
>>> Depends:
>>>
>>> Imports: GenomicRanges
>>>
>>> Maintainer: VJ Carey <stvjc at channing.harvard.edu
>>> <mailto:stvjc at channing.harvard.edu>
>>> <mailto:stvjc at channing.__harvard.edu
>>>
>>> <mailto:stvjc at channing.harvard.edu>>>
>>>
>>>
>>> License: Private
>>>
>>> LazyLoad: yes
>>>
>>>
>>>
>>> %vjcair> cat foo/R/*
>>>
>>> myfun = function(seqnames="1", ranges=IRanges(1,2), ...)
>>>
>>> GRanges(seqnames=seqnames, ranges=ranges, ...)
>>>
>>>
>>> The following works:
>>>
>>>
>>> library(foo)
>>>
>>>
>>> x = myfun()
>>>
>>>
>>> x
>>>
>>>
>>> GRanges object with 1 range and 0 metadata columns:
>>>
>>> seqnames ranges strand
>>>
>>> <Rle> <IRanges> <Rle>
>>>
>>> [1] 1 [1, 2] *
>>>
>>> -------
>>>
>>> seqinfo: 1 sequence from an unspecified genome; no
>>> seqlengths
>>>
>>>
>>> So the show method works, even though I have not
>>> touched it. (I
>>> did not
>>>
>>> expect it to work, in fact.)
>>>
>>>
>>> Exactly. Let's call it luck ;-)
>>>
>>> Additionally, I can get access to slots.
>>>
>>>
>>> The end user should never try to access slots directly but
>>> use getters
>>> and setters instead. And most getters and setters for
>>> GRanges objects
>>> are defined and documented in the GenomicRanges package.
>>> Those that are
>>> not are defined in packages that GenomicRanges depends on.
>>>
>>> But
>>> ranges()
>>>
>>> fails. If I, the user, want to use it, I need to
>>> arrange for that.
>>>
>>>
>>> IMO if your package returns a GRanges object to the user,
>>> then the user
>>> should be able to access the man page for GRanges objects
>>> with ?GRanges.
>>>
>>>
>>> Oddly enough, that seems to be incorrect. I added a man page to
>>> foo
>>> that has
>>> a \link[GenomicRanges]{GRanges-__class}. I ran help.start and
>>> the cross
>>> reference
>>> from my man page succeeds. Furthermore with the sessionInfo
>>> below, ?GRanges
>>> succeeds at the CLI.
>>>
>>>
>>> Did you try to run example(GRanges)? I'm not sure that will work.
>>>
>>>
>>> Correct. Cursory look at source shows that help() uses loadedNamespaces()
>>> to find the help file. example() could probably do likewise.
>>>
>>
>> Sounds reasonable. So it seems that some recent changes in R make
>> it possible to access the man page and examples for stuff that
>> is imported but not attached. This is an important shift in paradigm
>> to me. In the past I would just rely on the simple notion that
>> what I can access with ? or example() reflects what's in my
>> search pass. Now if I do ?DNAStringSet and it succeeds, I can't
>> assume DNAStringSet() is in my search path anymore. And if I
>> want to copy/paste a few commands from the examples in order to
>> try them in my session, they might fail because the package where
>> these examples belong is not necessarily attached.
>> I wonder whether that means we should now start every example
>> section with library(foo)? The rationale for not doing it so far
>>
>
> I think that would be excessive. You are correct that some code will
> not run, and the user will have to decide what to do. We have access to
> core members. example() could be tuned to check for attachment of the
> package hosting the page and fail if the host package is not attached, with
> a hint as to how to proceed. For cutting and pasting, caveat emptor.
That's already taken care of; example() already attaches the package,
cf. https://github.com/wch/r-source/blob/trunk/src/library/utils/R/example.R#L53-L54
EXAMPLE:
$ R --vanilla
R Under development (unstable) (2014-10-26 r66879) -- "Unsuffered Consequences"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
[...]
> search()
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
> example("md5sum", package="tools")
md5sum> as.vector(md5sum(dir(R.home(), pattern = "^COPY", full.names = TRUE)))
[1] "0cce1e42ef3fb133940946534fcf8896"
> search()
[1] ".GlobalEnv" "package:tools" "package:stats"
[4] "package:graphics" "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods" "Autoloads"
[10] "package:base"
>
>
>> was that if you can access the man page with ? then that means
>> the package is already attached.
...but maybe it wouldn't hurt to be explicit and add a library("...")
at the top, just as we do everywhere else including vignettes and
package test scripts.
/Henrik
>>
>> As a side note the decision to extend the scope of ? to attached
>> packages and not to all installed packages feels arbitrary to me.
>> Going all the way would make ? even more useful and would be
>> consistent with what I see when navigating the documentation in
>> a browser. So when the user wants to call DNAStringSet() but
>> doesn't remember where it lives, ?DNAStringSet would be a quick
>> and easy way to know, and this whether the package is loaded via
>> a namespace or not.
>>
>
> I think this is a reasonable objective.
>
>
>>
>> Anyway, to get back to the original topic, IMO this change in R
>> still doesn't justify changing the Depends vs Imports game. I see
>> at least 3 strong cases for using 'Depends: A' instead of 'Imports: A'
>> in package B:
>> (1) B defines (and exports) a class that extend a class defined in A.
>>
>
> In my view there is a risk of needless namespace pollution in this case.
> Depends seems extreme, other things being equal. Better to let the user
> determine in real time whether this should occur. It seems to me that
> particularly
> when packages have lots of complicated interrelationships, it is best to
> have the
> developers manage symbols internally to the code, reducing as much as
> possible
> the impact on the user the user environment. Minimizing the use of Depends
> seems
> consistent with this.
>
>
>> (2) B defines (and exports) methods for a generic defined in A.
>> (3) B defines (and exports) functions or methods that return
>> objects of a class defined in package A.
>>
>> 'Imports: A' should be reserved to situations where A is used
>> internally by B and in a way that is B's internal business only
>> and none of the end-user's business. A typical example is the
>> internal use of RSQLite and biomaRt in GenomicFeatures.
>>
>
> I'm sympathetic to this view but would rather be out of the business of
> figuring out what the end-user's business is apart from using and
> getting value from the functions defined in the package that I contributed.
>
> Leaving the attachments up to the user is one way.
>
>
>>
>> I can see the attractiveness of trying to minimize what gets attached
>> to the user's session but I'm also concerned that trying to go to far
>> in that direction ultimately has no real benefit and can hurt the
>> user-friendliness of the software.
>>
>
> We should try to assemble data on this concern. I don't know how to do it.
>
>
>>
>> H.
>>
>>
>>>
>>> For example after I do library(rtracklayer), I can indeed do
>>> ?DNAStringSet at the command line (I'm surprised this works), but
>>> then example(DNAStringSet) fails:
>>>
>>> > example(DNAStringSet)
>>> Warning message:
>>> In example(DNAStringSet) : no help found for ‘DNAStringSet’
>>>
>>> I'm also surprised this is just a warning but that's another story...
>>>
>>> H.
>>>
>>> I am not trying to defend the NOTE but the
>>> principle of minimizing
>>> Depends declarations needs to be considered critically, and I am
>>> just
>>> exploring the space.
>>>
>>> > ?GRanges # it worked as usual in the tty
>>>
>>> > sessionInfo()
>>>
>>> R version 3.1.1 (2014-07-10)
>>>
>>> Platform: x86_64-apple-darwin13.1.0 (64-bit)
>>>
>>>
>>> locale:
>>>
>>> [1]
>>> en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
>>>
>>>
>>>
>>> attached base packages:
>>>
>>> [1] stats graphics grDevices datasets utils tools
>>> methods
>>>
>>> [8] base
>>>
>>>
>>> other attached packages:
>>>
>>> [1] foo_0.0.0 rmarkdown_0.3.8 knitr_1.6
>>>
>>> [4] weaver_1.31.0 codetools_0.2-9 digest_0.6.4
>>>
>>> [7] BiocInstaller_1.16.0
>>>
>>>
>>> loaded via a namespace (and not attached):
>>>
>>> [1] BiocGenerics_0.11.5 evaluate_0.5.5 formatR_1.0
>>>
>>> [4] GenomeInfoDb_1.1.26 GenomicRanges_1.17.48 htmltools_0.2.6
>>>
>>> [7] IRanges_1.99.32 parallel_3.1.1 S4Vectors_0.2.8
>>>
>>> [10] stats4_3.1.1 stringr_0.6.2 XVector_0.5.8
>>>
>>> And that works only if the GenomicRanges package is
>>> attached. Attaching
>>> GenomicRanges will also attach other packages that
>>> GenomicRanges depends
>>> on where some GRanges accessors might be defined and
>>> documented (e.g.
>>> metadata()).
>>>
>>>
>>>
>>> In some cases you'll decide you want the user to have a
>>> full
>>> complement of
>>>
>>> methods for your package to function meaningfully. For
>>> example,
>>> I am
>>> considering
>>>
>>> using dplyr idioms to work with data structures in a
>>> package,
>>> and it seems
>>> I should
>>>
>>> just depend on dplyr rather than pick out and document
>>> which
>>> things I want
>>> to expose. But that
>>>
>>> may still be an undesirable design.
>>>
>>>
>>> package, like
>>> importClassesFrom("____GenomicRanges",
>>> "GRanges")
>>>
>>>
>>> exportClasses("GRanges")
>>> Surely that is not intended.
>>>
>>> It is important that my package works without being
>>> attached
>>> to the search
>>> path and I do this by carefully importing what I
>>> need, ie.
>>> my code does not
>>> require that my dependencies are attached to the
>>> search
>>> path. But the end
>>> user will be hosed without it.
>>>
>>>
>>> Yes s/he will. Fortunately when your package namespace gets
>>> loaded by
>>> another package, then nothing gets attached to the search
>>> path, even if
>>> your package depends (instead of imports) on other
>>> packages. So using
>>> Depends instead of Imports for your own dependencies won't
>>> make any
>>> difference in that respect, which is good.
>>>
>>>
>>> My impression is that the NOTE in R CMD check was
>>> written by
>>> someone who
>>> did not anticipate large-scale use and re-use of
>>> classes and
>>> methods across
>>> many packages.
>>>
>>>
>>> That's my impression too.
>>>
>>> Cheers,
>>> H.
>>>
>>>
>>> Best,
>>> Kasper
>>>
>>>
>>> On Tue, Oct 28, 2014 at 11:14 AM, James W. MacDonald
>>> <jmacdon at uw.edu <mailto:jmacdon at uw.edu>
>>> <mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>>>
>>> wrote:
>>>
>>> I agree with Vince. It's your job as a package
>>> developer
>>> to make
>>> available to your package all the functions
>>> necessary
>>> for the package to
>>> work. But I am not sure it is your job to load
>>> all the
>>> packages that your
>>> end user might need.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>> On Tue, Oct 28, 2014 at 11:04 AM, Vincent Carey <
>>> stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>
>>> <mailto:stvjc at channing.__harvard.edu
>>> <mailto:stvjc at channing.harvard.edu>>> wrote:
>>>
>>> On Tue, Oct 28, 2014 at 10:19 AM, Kasper
>>> Daniel Hansen <
>>> kasperdanielhansen at gmail.com <mailto:kasperdanielhansen at gmail.com
>>> >
>>> <mailto:kasperdanielhansen at __gmail.com
>>> <mailto:kasperdanielhansen at gmail.com>>> wrote:
>>>
>>> What is the current best paradigm for
>>> using all
>>> the classes in
>>>
>>> S4Vectors/GenomeInfoDb/____GenomicRanges/IRanges
>>>
>>>
>>>
>>> I obviously import methods and classes
>>> from the
>>> relevant packages.
>>>
>>> But shouldn't I depend on these packages
>>> as
>>> well? Since I basically
>>>
>>> want
>>>
>>> the user to have this functionality at
>>> the
>>> command line? That is what
>>>
>>> I do
>>>
>>> now.
>>>
>>>
>>> I've wondered about this as well. It seems
>>> the
>>> principle is that the
>>> user
>>> should
>>> take care of attaching additional packages
>>> when
>>> needed. It might be
>>> appropriate
>>> to give a hint in the package startup
>>> message, if
>>> having some other
>>> package
>>> attached
>>> would typically be of great utility.
>>>
>>> Given your list above, I would think that
>>> depending
>>> on GenomicRanges
>>> would
>>> often
>>> be sufficient, and IRanges/S4Vectors would
>>> not
>>> require dependency
>>> assertion. I would
>>> think that GenomeInfoDb should be a voluntary
>>> attachment for a specific
>>> session.
>>>
>>> These are just my guesses -- I doubt there
>>> will be
>>> complete consensus,
>>> but
>>> I have
>>> started to think very critically about using
>>> Depends, and I think it is
>>> better when its
>>> use is minimized.
>>>
>>>
>>> That of course leads to the R CMD check
>>> NOTE on
>>> depending on too many
>>> packages.... I guess I should ignore
>>> that one.
>>>
>>> Best,
>>> Kasper
>>>
>>> [[alternative HTML version
>>> deleted]]
>>>
>>>
>>> ___________________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> <mailto:Bioc-devel at r-project.__org
>>> <mailto:Bioc-devel at r-project.org>> mailing list
>>> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>>>
>>>
>>> [[alternative HTML version
>>> deleted]]
>>>
>>>
>>> ___________________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> <mailto:Bioc-devel at r-project.__org
>>> <mailto:Bioc-devel at r-project.org>> mailing list
>>> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>>
>>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>>>
>>>
>>>
>>>
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ___________________________________________________
>>> Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
>>> <mailto:Bioc-devel at r-project.__org
>>> <mailto:Bioc-devel at r-project.org>>
>>> mailing list
>>> https://stat.ethz.ch/mailman/____listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
>>> <https://stat.ethz.ch/mailman/__listinfo/bioc-devel
>>> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>> <mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>
>>>
>>>
>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>> <tel:%28206%29%20667-5791>
>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>> <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>>> --
>>> Hervé Pagès
>>>
>>> Program in Computational Biology
>>> Division of Public Health Sciences
>>> Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N, M1-B514
>>> P.O. Box 19024
>>> Seattle, WA 98109-1024
>>>
>>> E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
>>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>>>
>>>
>>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fredhutch.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list