[Bioc-devel] GenomicFeatures and/or TxDb.Hsapiens.UCSC.hg19.knownGene issue: missing tibble

Martin Morgan mtmorg@n@b|oc @end|ng |rom gm@||@com
Sun Apr 26 23:53:52 CEST 2020


I spent a bit of time not understanding why you were being so complicated -- BiocManager::install() finds all CRAN / Bioc dependencies, there's no need to use remotes at all and for debugging purposes it just seemed (still seems?) like you were making trouble for yourself.

But eventually... I created a fake CRAN-style repository

$ tree my_repo/
my_repo/
├── bin
│   └── macosx
│       └── contrib
│           └── 4.0
│               └── PACKAGES
└── src
    └── contrib
        └── PACKAGES

The plain-text PACKAGES file is an index of the packages that are supposed to be available. So under the 'bin' tree I have

---
Package: foo
Version: 1.0.0
NeedsCompilation: true

Package: bar
Version: 1.0.0
Depends: foo


Package: baz
Version: 1.0.0
Depends: bar
---

baz depends on bar depends on foo, and binary versions are all at 1.0.0

Under the src tree I have

---
Package: foo
Version: 1.0.1
NeedsCompilation: true

Package: bar
Version: 1.0.0
Depends: foo


Package: baz
Version: 1.0.0
Depends: bar
```
with a more recent src for foo at version 1.0.1. I guess this is (almost) the situation with GenomeInfoDbData / tibble.

In an R session I have

> available.packages(repos="file:///tmp/my_repo/")
    Package Version Priority Depends Imports LinkingTo Suggests Enhances
foo "foo"   "1.0.1" NA       NA      NA      NA        NA       NA
bar "bar"   "1.0.0" NA       "foo"   NA      NA        NA       NA
baz "baz"   "1.0.0" NA       "bar"   NA      NA        NA       NA
    License License_is_FOSS License_restricts_use OS_type Archs MD5sum
foo NA      NA              NA                    NA      NA    NA
bar NA      NA              NA                    NA      NA    NA
baz NA      NA              NA                    NA      NA    NA
    NeedsCompilation File Repository
foo "true"           NA   "file:///tmp/my_repo/src/contrib"
bar NA               NA   "file:///tmp/my_repo/src/contrib"
baz NA               NA   "file:///tmp/my_repo/src/contrib"

I'll try to 'install' baz; it'll fail because there are no packages to install, but it's still informative...

> install.packages("baz", repos = "file:///tmp/my_repo")
Installing package into '/Users/ma38727/Library/R/4.0/Bioc/3.11/library'
(as 'lib' is unspecified)
also installing the dependencies 'foo', 'bar'


  There is a binary version available but the source version is later:
    binary source needs_compilation
foo  1.0.0  1.0.1              TRUE

Do you want to install from sources the package which needs compilation? (Yes/no/cancel) yes
Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  package 'bar' does not exist on the local repository
Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  package 'baz' does not exist on the local repository
installing the source package 'foo'

Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  package 'foo' does not exist on the local repository

Note the order of downloads -- binaries first, then source as you found! (actually, this would 'work' because the binaries are installed without any test load, but in more complicated situations...)

On the other hand, if I answer 'no' to install the more recent source packages I get

  There is a binary version available but the source version is later:
    binary source needs_compilation
foo  1.0.0  1.0.1              TRUE
Do you want to install from sources the package which needs compilation? (Yes/no/cancel) no
Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  package 'foo' does not exist on the local repository
Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  package 'bar' does not exist on the local repository
Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  package 'baz' does not exist on the local repository

installing in the order required for dependencies.

If I remove baz from the source repository, I get a similar order of events, with an additional prompt about installing 'baz' from source.

I don't actually see, from the 'Binary packages' section of ?install.packages, how to get R to respond 'no' to the prompt to install the more recent source package foo, but still  install the source-only package 'baz'...

Of course this is transient, when there more recent source than binaries; my own installation of TxDb on macOS found a binary tibble as current as the source, and went without problem.

Martin

On 4/26/20, 4:48 PM, "Leonardo Collado Torres" <lcolladotor using gmail.com> wrote:

    Hi everyone,

    Charlotte, thank you very much! I didn't know about that issue on
    `remotes` and the fix attempts. Thank you for the info Martin!

    However, I have to report that it doesn't seem like switching from
    remotes::install_deps() to BiocManager::install() fixes the issue. I
    updated my GitHub Actions workflow to obtain the list of dependencies
    using remotes, but install them with BiocManager::install() instead of
    remotes::install_deps(). You can see this at
    https://github.com/leekgroup/derfinderPlot/blob/ea58939ac6bf13cae7d26951732914d96b5f7d07/.github/workflows/check-bioc.yml#L139-L149
    although I include the relevant lines of code below:

    ## Locate the package dependencies
    deps <- remotes::dev_package_deps(dependencies = TRUE)

    ## Install any that need to be updated using BiocManager to avoid
    ## the issues described at
    ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016675.html
    ## https://github.com/r-lib/remotes/issues/296
    remotes::install_cran("BiocManager")
    BiocManager::install(deps$package[deps$diff != 0])


    This still leads to TxDb.Hsapiens.UCSC.hg19.knownGene failing to
    install because GenomeInfoDbData is not available on both macOS and
    Windows (again, this doesn't fail on the Bioconductor devel docker).
    Here's for example the error on Windows
    https://github.com/leekgroup/derfinderPlot/runs/620055131?check_suite_focus=true#step:12:1077.
    Immediately after, GenomeInfoDbData does get installed
    https://github.com/leekgroup/derfinderPlot/runs/620055131?check_suite_focus=true#step:12:1100
    and after it, tibble
    https://github.com/leekgroup/derfinderPlot/runs/620055131?check_suite_focus=true#step:12:1174.

    Likely this issue only happens on Windows and macOS because of the
    availability of some packages in source form and others in binary
    form, unlike only using source versions in the Bioconductor docker
    run. However, maybe I need some other code to get all the
    dependencies of a given package in a different order, though I was
    hoping that BiocManager::install() would find the right order for me
    as it seems to try to do so already.

    Charlotte linked to
    https://github.com/r-lib/remotes/commit/88f302fe53864e4f27fc7b3897718fea9a8b1fa9.
    So maybe there's still something else to try to fix in remotes and/or
    BiocManager instead of the DESCRIPTION files of other packages like I
    initially thought of in this thread and in
    https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016671.html.

    Best,
    Leo



    On Sun, Apr 26, 2020 at 10:30 AM Martin Morgan <mtmorgan.bioc using gmail.com> wrote:
    >
    > Thanks Charlotte for the detective work.
    >
    >
    >
    > Annotation packages (TxDb, org, BSgenome, and GenomeInfoDbData, for instance) are distributed only as source – this was a decision made quite a while (years) ago, to save disk space (some of these packages are large, and hosting macOS and Windows binaries in addition to source triple disk space requirements) and on the rationale that the packages do not have C-level source code so users do not need RTools or XCode (etc) to install from ‘source’. So in this context and in the face of a buggy remotes package, and installation of Bioconductor packages through non-standard approaches (BiocManager::install() for CRAN and Bioconductor packages and their dependencies use base R commands only) I guess the behavior you document is really an (ongoing?) bug in the remotes package?
    >
    >
    >
    > Over the years the distribution of source-only annotation packages has caused problems, in particular when (usually Windows) users have temporary or library paths with spaces or non-ASCII characters. I believe that this upstream bug (in R’s handling of Windows paths) has been fixed in the 4.0.0 release, but the details are quite complicated and I have not been able to follow the discussion fully.
    >
    >
    >
    > Martin
    >
    >
    >
    > From: Charlotte Soneson <charlottesoneson using gmail.com>
    > Date: Sunday, April 26, 2020 at 5:32 AM
    > To: Martin Morgan <mtmorgan.bioc using gmail.com>
    > Cc: Leonardo Collado Torres <lcolladotor using gmail.com>, Bioc-devel <bioc-devel using r-project.org>
    > Subject: Re: [Bioc-devel] GenomicFeatures and/or TxDb.Hsapiens.UCSC.hg19.knownGene issue: missing tibble
    >
    >
    >
    > Hi Leo, Martin,
    >
    >
    >
    > it looks like this is related to an issue with the remotes package: https://github.com/r-lib/remotes/issues/296. It gets the installation order wrong, and tries to install source packages before binaries. This can be a problem with GenomeInfoDbData (which I think doesn’t have a binary, and which it looks like Leo is installing manually). The TxDb package also doesn’t seem to be available as a binary package, and currently the source package for tibble is newer than the Windows binary.
    >
    >
    >
    > According to the issue above, it should have been fixed in remotes v2.1.1 (https://github.com/r-lib/remotes/commit/88f302fe53864e4f27fc7b3897718fea9a8b1fa9). To try things out, I set up a minimal package with the only dependency being TxDb.Hsapiens.UCSC.hg19.knownGene (https://github.com/csoneson/testpkg), and checked it with GitHub Actions on macOS and Windows. It fails in both cases, since it’s trying to install TxDb.Hsapiens.UCSC.hg19.knownGene first (e.g. https://github.com/csoneson/testpkg/runs/619407291?check_suite_focus=true#step:7:533). If I depend instead on GenomicFeatures, everything builds fine (here we have a binary). It is using remotes v2.1.1 though, so perhaps this needs to be investigated further.
    >
    >
    >
    > Charlotte
    >
    >
    >
    > On 25 Apr 2020, at 22:20, Martin Morgan <mtmorgan.bioc using gmail.com> wrote:
    >
    >
    >
    > tibble is not a direct dependency of TxDb*.
    >
    >
    > db = available.packages(repos = BiocManager::repositories())
    > deps = tools::package_dependencies("TxDb.Hsapiens.UCSC.hg19.knownGene", db)
    > deps
    >
    > $TxDb.Hsapiens.UCSC.hg19.knownGene
    > [1] "GenomicFeatures" "AnnotationDbi"
    >
    > but it is an indirect dependency
    >
    >
    > deps = tools::package_dependencies("TxDb.Hsapiens.UCSC.hg19.knownGene", db, recursive=TRUE)
    > "tibble" %in% unlist(deps)
    >
    > [1] TRUE
    >
    > I did
    >
    >  deps1 = tools::package_dependencies("TxDb.Hsapiens.UCSC.hg19.knownGene", db, recursive=TRUE)
    >
    >  deps2 = tools::package_dependencies("tibble", db, recursive=TRUE, reverse=TRUE)
    >
    >  intersect(unlist(deps1), unlist(deps2))
    >  ## [1] "GenomicFeatures" "biomaRt"         "BiocFileCache"   "dbplyr"
    >  ## [5] "dplyr"
    >
    > I believe R checks for immediate dependencies, found all for TxDb* and GenomicFeatures available, and didn’t check further. I speculate that you removed tibble, or installed one of the packages in the above list, without satisfying the dependencies for that package. Or perhaps what the message is really trying to say is that it failed to load tibble (because it was installed in a previous version of the R toolchain?)
    >
    > It would be interesting to debug this further on your system, to understand the problem for other users.
    >
    > Martin
    >
    > On 4/25/20, 2:48 PM, "Bioc-devel on behalf of Leonardo Collado Torres" <bioc-devel-bounces using r-project.org on behalf of lcolladotor using gmail.com> wrote:
    >
    >    Hi Bioc-devel,
    >
    >    I think that there's a potential issue with either GenomicFeatures,
    >    TxDb.Hsapiens.UCSC.hg19.knownGene or an upstream package.
    >
    >
    >    On a fresh R 4.0 Windows installation with BioC 3.11, I get the
    >    following error message when installing
    >    TxDb.Hsapiens.UCSC.hg19.knownGene as shown at
    >    https://github.com/leekgroup/derfinderPlot/runs/618370463?check_suite_focus=true#step:13:1225.
    >
    >
    >    2020-04-25T18:32:26.0765748Z * installing *source* package
    >    'TxDb.Hsapiens.UCSC.hg19.knownGene' ...
    >    2020-04-25T18:32:26.0769789Z ** using staged installation
    >    2020-04-25T18:32:26.1001400Z ** R
    >    2020-04-25T18:32:26.1044734Z ** inst
    >    2020-04-25T18:32:26.2061605Z ** byte-compile and prepare package for
    >    lazy loading
    >    2020-04-25T18:32:30.7296724Z ##[error]Error: package or namespace load
    >    failed for 'GenomicFeatures' in loadNamespace(i, c(lib.loc,
    >    .libPaths()), versionCheck = vI[[i]]):
    >    2020-04-25T18:32:30.7305615Z ERROR: lazy loading failed for package
    >    'TxDb.Hsapiens.UCSC.hg19.knownGene'
    >    2020-04-25T18:32:30.7306686Z * removing
    >    'D:/a/_temp/Library/TxDb.Hsapiens.UCSC.hg19.knownGene'
    >    2020-04-25T18:32:30.7307196Z  there is no package called 'tibble'
    >    2020-04-25T18:32:30.7310561Z ##[error]Error: package 'GenomicFeatures'
    >    could not be loaded
    >    2020-04-25T18:32:30.7311805Z Execution halted
    >
    >    From looking at the bioc-devel landing pages for both GenomicFeatures
    >    and TxDb.Hsapiens.UCSC.hg19.knownGene, I see that tibble is not listed
    >    as a dependency for either package.
    >
    >    Best,
    >    Leo
    >
    >    _______________________________________________
    >    Bioc-devel using r-project.org mailing list
    >    https://stat.ethz.ch/mailman/listinfo/bioc-devel
    > _______________________________________________
    > Bioc-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/bioc-devel
    >
    >


More information about the Bioc-devel mailing list