[Bioc-devel] new package for accessing some chemical and biological databases

Mike Smith gr|mbough @end|ng |rom gm@||@com
Fri Sep 13 16:14:53 CEST 2019


I've lost track of whether the infrastructure is actually used, but
certainly some package have a 'longtests' folder e.g.
https://github.com/LTLA/beachmat

On Fri, 13 Sep 2019 at 16:02, Kasper Daniel Hansen <
kasperdanielhansen using gmail.com> wrote:

> We used to have (? or at least discussed the possibility of) occasional
> extensive checking so we could have
>   tests
>   long_tests
> (names made up).
>
> On Fri, Sep 13, 2019 at 9:50 AM Martin Morgan <mtmorgan.bioc using gmail.com>
> wrote:
>
> > Putting bioc-devel back in the loop.
> >
> > I think that the straight-forward answer to your original query is 'no,
> > git modules are not supported'.
> >
> > I think we'd carry on and say 'packages should be self-contained and
> > conform to the Bioconductor size and time constraints', so you cannot
> have
> > a very large package or a package that takes a long time to check, and
> you
> > can't download part of the package from some alternative source (except
> > perhaps AnnotationHub or ExperimentHub). I agree that the hubs are not
> > suitable for regularly updated files, and that they are meant for
> > biologically motivated rather than purely test-related data resources.
> >
> > While we 'could' make special accommodations on the build systems to
> > support your package, we have found that this is not a fruitful endeavor.
> >
> > A natural place to put files used in tests would be in the /tests
> > directory; these are not included in the installed package. But it seems
> > likely that including your tests would violate the time and / or space
> > limitations we place on packages.
> >
> > It seems likely that this leads to the question you pose below, which is
> > how do you know that you're running on the build system so that you can
> > perform more modest computations? This is similar to here, where special
> > resources are normally required
> >
> >   https://stat.ethz.ch/pipermail/bioc-devel/2019-September/015518.html
> >
> > Herve seems not willing to commit to an easy answer, perhaps because this
> > opens the door to people circumventing even minimal tests of their
> > package...
> >
> > Martin
> >
> > On 9/13/19, 7:49 AM, "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>
> > wrote:
> >
> >
> >     I'm including Martin and Herve for their opinions and to chime in too
> > since you took this conversation off the mailing list...
> >
> >
> >     Could you please describe what you mean by works transparently?
> >
> >
> >     We realize there isn't a function to call -  we were suggesting you
> > make a function to call that could be utilized
> >
> >
> >     How does your caching system work?  I would also advise looking into
> > BiocFileCache - the Bioconductor suggested package for data caching of
> > files.
> >
> >
> >
> >
> >     The relevant files to look at for the environment calls can be found
> >     https://github.com/Bioconductor/Contributions
> >
> >     esp.
> >
> https://github.com/Bioconductor/Contributions#r-cmd-check-environment
> >
> >
> >
> >     Please also be mindful of:
> >
> >     Submission Guidelines
> >     https://bioconductor.org/developers/package-submission/
> >
> >     Package Guidelines
> >     https://bioconductor.org/developers/package-guidelines/
> >
> >
> >
> >
> >     More specifically on the single package builder we use:
> >     R CMD BiocCheckGitClone <package>
> >     R CMD build --keep-empty-dirs --no-resave-data  <package>
> >
> >     R CMD check --no-vignettes --timings <package_tar>
> >
> >     R CMD BiocCheck --build-output-file=<path to R.out> --new-package
> > <package_tar>
> >
> >
> >
> >     With the environment variables set up as described in the above link
> >
> >
> >     special files are not encouraged and as far as I am aware not
> > allowed.  Herve who has more experience with the builders may be able to
> > chime in further here.
> >
> >
> >
> >
> >
> >
> >
> >     Lori Shepherd
> >     Bioconductor Core Team
> >     Roswell Park Cancer Institute
> >     Department of Biostatistics & Bioinformatics
> >     Elm & Carlton Streets
> >     Buffalo, New York 14263
> >
> >
> >     ________________________________________
> >     From: Pierrick Roger <pierrick.roger using cea.fr>
> >     Sent: Friday, September 13, 2019 2:48 AM
> >     To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> >     Subject: Re: [Bioc-devel] new package for accessing some chemical and
> > biological databases
> >
> >     Thank you for the example. However I do not think it is relevant.
> This
> >     package has no examples, no tests and just one vignette. The `get`
> >     function is part of the interface, so it makes sens to use it inside
> >     the vignette. But for my package biodb, there is no function to call,
> >     the cache works transparently.
> >
> >     Could you please give me more details about the build process of
> > packages in
> >     Bioconductor? Are there some environment variables set during the
> build
> >     so a package can now it is being built or checked by Bioconductor? If
> >     this is the case, maybe I could write a tweak in my code in order to
> >     download the cache when needed.
> >     If not, would it be possible to have them defined or to have to have
> a
> >     special file `bioc.yml` defined at the root of the package in which I
> >     could write a `prebuild_step` command for retrieving the cache from
> my
> >     public GitHub repos `biodb-cache`?
> >
> >     On Thu 12 Sep 19 17:12, Shepherd, Lori wrote:
> >     > Please look at  SRAdb  for an example of how we would recommend
> > keeping the data.
> >     >
> >     > Summary:
> >     > On github or wherever you would like to host and keep the data
> > current, please make sure it is publically accessible.  Within your
> package
> > have an download function that retrieves the file from the public
> location.
> >     >
> >     > Its not recommended but This will be acceptable in this case.
> >     >
> >     > Thank you.
> >     >
> >     >
> >     > Lori Shepherd
> >     >
> >     > Bioconductor Core Team
> >     >
> >     > Roswell Park Cancer Institute
> >     >
> >     > Department of Biostatistics & Bioinformatics
> >     >
> >     > Elm & Carlton Streets
> >     >
> >     > Buffalo, New York 14263
> >     >
> >     > ________________________________
> >     > From: Pierrick Roger <pierrick.roger using cea.fr>
> >     > Sent: Thursday, September 12, 2019 10:48 AM
> >     > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> >     > Subject: Re: [Bioc-devel] new package for accessing some chemical
> > and biological databases
> >     >
> >     > Examples can be run without the cache, and vignettes can be built
> >     > without it too.
> >     > In fact, the cache system is part of the package, and can be used
> by
> > the
> >     > user or turned off if not wanted or needed. Using the cache avoids
> to
> >     > send too many identical requests to the database servers.
> >     > So yes users will access the databases directly, and use the cache
> to
> >     > speed up their code.
> >     >
> >     > I use this same cache system also while running `R CMD check` on
> >     > Travis-CI for instance, in order to avoid taking too much time with
> >     > requests and having errors returned by servers. Servers are not
> > always
> >     > stable, and often the `R CMD check` will fail if not using the
> cache.
> >     >
> >     > On Thu 12 Sep 19 11:36, Shepherd, Lori wrote:
> >     > > Would the cache not be a subset of data for using the examples,
> > vigenttes, and tests that could be fairly stable and not necessarily use
> > the updated database or be updated less frequently   But wouldn't your
> code
> > and for a users case do the longer process
> >      of accessing databases directly?  Or was I misunderstanding?
> >     > >
> >     > >
> >     > > Lori Shepherd
> >     > >
> >     > > Bioconductor Core Team
> >     > >
> >     > > Roswell Park Cancer Institute
> >     > >
> >     > > Department of Biostatistics & Bioinformatics
> >     > >
> >     > > Elm & Carlton Streets
> >     > >
> >     > > Buffalo, New York 14263
> >     > >
> >     > > ________________________________
> >     > > From: Pierrick Roger <pierrick.roger using cea.fr>
> >     > > Sent: Thursday, September 12, 2019 3:18 AM
> >     > > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> >     > > Subject: Re: [Bioc-devel] new package for accessing some chemical
> > and biological databases
> >     > >
> >     > > Thank you for your answer.
> >     > > The biodb-cache repository contains 63109 files (484MB).
> >     > > Those files change regularly, since output of databases change
> > from time
> >     > > to time, and also I add new examples, vignettes and tests.
> >     > > Thus it is common that files are removed or updated or that new
> > files
> >     > > are added. After reading the ExperimentHub description, it seems
> > to me
> >     > > that my usage would not be exactly compatible with its principles
> > and
> >     > > definition. Am I wrong?
> >     > >
> >     > > On Wed 11 Sep 19 11:19, Shepherd, Lori wrote:
> >     > > > No we do not allow such submodules currently in Bioconductor.
> >     > > >
> >     > > > How big is the object?  I assume putting the data object in the
> > package increases the package size over the limit?
> >     > > >
> >     > > > If this is the case, We would recommend storing the data in the
> > ExperimentHub. See [Creating An ExperimentHub package](
> >
> https://bioconductor.org/packages/devel/bioc/vignettes/ExperimentHub/inst/doc/CreateAnExperimentHubPackage.html
> > )
> >     > > >
> >     > > >
> >     > > >
> >     > > >
> >     > > > Lori Shepherd
> >     > > >
> >     > > > Bioconductor Core Team
> >     > > >
> >     > > > Roswell Park Cancer Institute
> >     > > >
> >     > > > Department of Biostatistics & Bioinformatics
> >     > > >
> >     > > > Elm & Carlton Streets
> >     > > >
> >     > > > Buffalo, New York 14263
> >     > > >
> >     > > > ________________________________
> >     > > > From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf
> > of Pierrick Roger <pierrick.roger using cea.fr>
> >     > > > Sent: Wednesday, September 11, 2019 5:04 AM
> >     > > > To: bioc-devel using r-project.org <bioc-devel using r-project.org>
> >     > > > Subject: [Bioc-devel] new package for accessing some chemical
> > and biological databases
> >     > > >
> >     > > > Dear all,
> >     > > >
> >     > > > I'd like to submit by package biodb (
> > https://github.com/pkrog/biodb) in the near future.
> >     > > > The aim of this package is to present a unified access to
> diverse
> >     > > > databases (ChEBI, KEGG databases, Uniprot, ...).
> >     > > > For running examples, building vignettes and running tests, I
> > use a
> >     > > > cache that is stored in another GitHub repository
> >     > > > (https://github.com/pkrog/biodb-cache), and registered as a
> Git
> > submodule of
> >     > > > biodb.
> >     > > > This cache is currently necessary, since accessing the
> databases
> > during
> >     > > > "R CMD check" would lead to some connection errors and would
> > take too
> >     > > > much time.
> >     > > > I would like to know if this scheme is acceptable for
> > Bioconductor.
> >     > > >
> >     > > > Best regards,
> >     > > > --
> >     > > > Research engineer Pierrick Roger
> >     > > > http://www.cea-tech.fr |
> >     http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> >     > > > https://fr.linkedin.com/in/pkrog |
> >     https://github.com/pkrog
> >     > > > In varietate concordia.
> >     > > >
> >     > > > _______________________________________________
> >     > > > Bioc-devel using r-project.org mailing list
> >     > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >     > > >
> >     > > >
> >     > > > This email message may contain legally privileged and/or
> > confidential information.  If you are not the intended recipient(s), or
> the
> > employee or agent responsible for the delivery of this message to the
> > intended recipient(s), you are hereby notified that
> >      any disclosure, copying, distribution, or use of this email message
> > is prohibited.  If you have received this message in error, please notify
> > the sender immediately by e-mail and delete this email message from your
> > computer. Thank you.
> >     > >
> >     > > --
> >     > > Research engineer Pierrick Roger
> >     > > http://www.cea-tech.fr |
> >     http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> >     > > https://fr.linkedin.com/in/pkrog |
> >     https://github.com/pkrog
> >     > > In varietate concordia.
> >     > >
> >     > >
> >     > > This email message may contain legally privileged and/or
> > confidential information.  If you are not the intended recipient(s), or
> the
> > employee or agent responsible for the delivery of this message to the
> > intended recipient(s), you are hereby notified that
> >      any disclosure, copying, distribution, or use of this email message
> > is prohibited.  If you have received this message in error, please notify
> > the sender immediately by e-mail and delete this email message from your
> > computer. Thank you.
> >     >
> >     > --
> >     > Research engineer Pierrick Roger
> >     > http://www.cea-tech.fr |
> >     http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> >     > https://fr.linkedin.com/in/pkrog |
> >     https://github.com/pkrog
> >     > In varietate concordia.
> >     >
> >     >
> >     > This email message may contain legally privileged and/or
> > confidential information.  If you are not the intended recipient(s), or
> the
> > employee or agent responsible for the delivery of this message to the
> > intended recipient(s), you are hereby notified that
> >      any disclosure, copying, distribution, or use of this email message
> > is prohibited.  If you have received this message in error, please notify
> > the sender immediately by e-mail and delete this email message from your
> > computer. Thank you.
> >
> >     --
> >     Research engineer Pierrick Roger
> >     http://www.cea-tech.fr |
> >     http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> >     https://fr.linkedin.com/in/pkrog |
> >     https://github.com/pkrog
> >     In varietate concordia.
> >
> >
> >
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
> --
> Best,
> Kasper
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list