[Bioc-devel] new package for accessing some chemical and biological databases

Kasper Daniel Hansen k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Fri Sep 13 16:00:51 CEST 2019


We used to have (? or at least discussed the possibility of) occasional
extensive checking so we could have
  tests
  long_tests
(names made up).

On Fri, Sep 13, 2019 at 9:50 AM Martin Morgan <mtmorgan.bioc using gmail.com>
wrote:

> Putting bioc-devel back in the loop.
>
> I think that the straight-forward answer to your original query is 'no,
> git modules are not supported'.
>
> I think we'd carry on and say 'packages should be self-contained and
> conform to the Bioconductor size and time constraints', so you cannot have
> a very large package or a package that takes a long time to check, and you
> can't download part of the package from some alternative source (except
> perhaps AnnotationHub or ExperimentHub). I agree that the hubs are not
> suitable for regularly updated files, and that they are meant for
> biologically motivated rather than purely test-related data resources.
>
> While we 'could' make special accommodations on the build systems to
> support your package, we have found that this is not a fruitful endeavor.
>
> A natural place to put files used in tests would be in the /tests
> directory; these are not included in the installed package. But it seems
> likely that including your tests would violate the time and / or space
> limitations we place on packages.
>
> It seems likely that this leads to the question you pose below, which is
> how do you know that you're running on the build system so that you can
> perform more modest computations? This is similar to here, where special
> resources are normally required
>
>   https://stat.ethz.ch/pipermail/bioc-devel/2019-September/015518.html
>
> Herve seems not willing to commit to an easy answer, perhaps because this
> opens the door to people circumventing even minimal tests of their
> package...
>
> Martin
>
> On 9/13/19, 7:49 AM, "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>
> wrote:
>
>
>     I'm including Martin and Herve for their opinions and to chime in too
> since you took this conversation off the mailing list...
>
>
>     Could you please describe what you mean by works transparently?
>
>
>     We realize there isn't a function to call -  we were suggesting you
> make a function to call that could be utilized
>
>
>     How does your caching system work?  I would also advise looking into
> BiocFileCache - the Bioconductor suggested package for data caching of
> files.
>
>
>
>
>     The relevant files to look at for the environment calls can be found
>     https://github.com/Bioconductor/Contributions
>
>     esp.
>     https://github.com/Bioconductor/Contributions#r-cmd-check-environment
>
>
>
>     Please also be mindful of:
>
>     Submission Guidelines
>     https://bioconductor.org/developers/package-submission/
>
>     Package Guidelines
>     https://bioconductor.org/developers/package-guidelines/
>
>
>
>
>     More specifically on the single package builder we use:
>     R CMD BiocCheckGitClone <package>
>     R CMD build --keep-empty-dirs --no-resave-data  <package>
>
>     R CMD check --no-vignettes --timings <package_tar>
>
>     R CMD BiocCheck --build-output-file=<path to R.out> --new-package
> <package_tar>
>
>
>
>     With the environment variables set up as described in the above link
>
>
>     special files are not encouraged and as far as I am aware not
> allowed.  Herve who has more experience with the builders may be able to
> chime in further here.
>
>
>
>
>
>
>
>     Lori Shepherd
>     Bioconductor Core Team
>     Roswell Park Cancer Institute
>     Department of Biostatistics & Bioinformatics
>     Elm & Carlton Streets
>     Buffalo, New York 14263
>
>
>     ________________________________________
>     From: Pierrick Roger <pierrick.roger using cea.fr>
>     Sent: Friday, September 13, 2019 2:48 AM
>     To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
>     Subject: Re: [Bioc-devel] new package for accessing some chemical and
> biological databases
>
>     Thank you for the example. However I do not think it is relevant. This
>     package has no examples, no tests and just one vignette. The `get`
>     function is part of the interface, so it makes sens to use it inside
>     the vignette. But for my package biodb, there is no function to call,
>     the cache works transparently.
>
>     Could you please give me more details about the build process of
> packages in
>     Bioconductor? Are there some environment variables set during the build
>     so a package can now it is being built or checked by Bioconductor? If
>     this is the case, maybe I could write a tweak in my code in order to
>     download the cache when needed.
>     If not, would it be possible to have them defined or to have to have a
>     special file `bioc.yml` defined at the root of the package in which I
>     could write a `prebuild_step` command for retrieving the cache from my
>     public GitHub repos `biodb-cache`?
>
>     On Thu 12 Sep 19 17:12, Shepherd, Lori wrote:
>     > Please look at  SRAdb  for an example of how we would recommend
> keeping the data.
>     >
>     > Summary:
>     > On github or wherever you would like to host and keep the data
> current, please make sure it is publically accessible.  Within your package
> have an download function that retrieves the file from the public location.
>     >
>     > Its not recommended but This will be acceptable in this case.
>     >
>     > Thank you.
>     >
>     >
>     > Lori Shepherd
>     >
>     > Bioconductor Core Team
>     >
>     > Roswell Park Cancer Institute
>     >
>     > Department of Biostatistics & Bioinformatics
>     >
>     > Elm & Carlton Streets
>     >
>     > Buffalo, New York 14263
>     >
>     > ________________________________
>     > From: Pierrick Roger <pierrick.roger using cea.fr>
>     > Sent: Thursday, September 12, 2019 10:48 AM
>     > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
>     > Subject: Re: [Bioc-devel] new package for accessing some chemical
> and biological databases
>     >
>     > Examples can be run without the cache, and vignettes can be built
>     > without it too.
>     > In fact, the cache system is part of the package, and can be used by
> the
>     > user or turned off if not wanted or needed. Using the cache avoids to
>     > send too many identical requests to the database servers.
>     > So yes users will access the databases directly, and use the cache to
>     > speed up their code.
>     >
>     > I use this same cache system also while running `R CMD check` on
>     > Travis-CI for instance, in order to avoid taking too much time with
>     > requests and having errors returned by servers. Servers are not
> always
>     > stable, and often the `R CMD check` will fail if not using the cache.
>     >
>     > On Thu 12 Sep 19 11:36, Shepherd, Lori wrote:
>     > > Would the cache not be a subset of data for using the examples,
> vigenttes, and tests that could be fairly stable and not necessarily use
> the updated database or be updated less frequently   But wouldn't your code
> and for a users case do the longer process
>      of accessing databases directly?  Or was I misunderstanding?
>     > >
>     > >
>     > > Lori Shepherd
>     > >
>     > > Bioconductor Core Team
>     > >
>     > > Roswell Park Cancer Institute
>     > >
>     > > Department of Biostatistics & Bioinformatics
>     > >
>     > > Elm & Carlton Streets
>     > >
>     > > Buffalo, New York 14263
>     > >
>     > > ________________________________
>     > > From: Pierrick Roger <pierrick.roger using cea.fr>
>     > > Sent: Thursday, September 12, 2019 3:18 AM
>     > > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
>     > > Subject: Re: [Bioc-devel] new package for accessing some chemical
> and biological databases
>     > >
>     > > Thank you for your answer.
>     > > The biodb-cache repository contains 63109 files (484MB).
>     > > Those files change regularly, since output of databases change
> from time
>     > > to time, and also I add new examples, vignettes and tests.
>     > > Thus it is common that files are removed or updated or that new
> files
>     > > are added. After reading the ExperimentHub description, it seems
> to me
>     > > that my usage would not be exactly compatible with its principles
> and
>     > > definition. Am I wrong?
>     > >
>     > > On Wed 11 Sep 19 11:19, Shepherd, Lori wrote:
>     > > > No we do not allow such submodules currently in Bioconductor.
>     > > >
>     > > > How big is the object?  I assume putting the data object in the
> package increases the package size over the limit?
>     > > >
>     > > > If this is the case, We would recommend storing the data in the
> ExperimentHub. See [Creating An ExperimentHub package](
> https://bioconductor.org/packages/devel/bioc/vignettes/ExperimentHub/inst/doc/CreateAnExperimentHubPackage.html
> )
>     > > >
>     > > >
>     > > >
>     > > >
>     > > > Lori Shepherd
>     > > >
>     > > > Bioconductor Core Team
>     > > >
>     > > > Roswell Park Cancer Institute
>     > > >
>     > > > Department of Biostatistics & Bioinformatics
>     > > >
>     > > > Elm & Carlton Streets
>     > > >
>     > > > Buffalo, New York 14263
>     > > >
>     > > > ________________________________
>     > > > From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf
> of Pierrick Roger <pierrick.roger using cea.fr>
>     > > > Sent: Wednesday, September 11, 2019 5:04 AM
>     > > > To: bioc-devel using r-project.org <bioc-devel using r-project.org>
>     > > > Subject: [Bioc-devel] new package for accessing some chemical
> and biological databases
>     > > >
>     > > > Dear all,
>     > > >
>     > > > I'd like to submit by package biodb (
> https://github.com/pkrog/biodb) in the near future.
>     > > > The aim of this package is to present a unified access to diverse
>     > > > databases (ChEBI, KEGG databases, Uniprot, ...).
>     > > > For running examples, building vignettes and running tests, I
> use a
>     > > > cache that is stored in another GitHub repository
>     > > > (https://github.com/pkrog/biodb-cache), and registered as a Git
> submodule of
>     > > > biodb.
>     > > > This cache is currently necessary, since accessing the databases
> during
>     > > > "R CMD check" would lead to some connection errors and would
> take too
>     > > > much time.
>     > > > I would like to know if this scheme is acceptable for
> Bioconductor.
>     > > >
>     > > > Best regards,
>     > > > --
>     > > > Research engineer Pierrick Roger
>     > > > http://www.cea-tech.fr |
>     http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
>     > > > https://fr.linkedin.com/in/pkrog |
>     https://github.com/pkrog
>     > > > In varietate concordia.
>     > > >
>     > > > _______________________________________________
>     > > > Bioc-devel using r-project.org mailing list
>     > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
>     > > >
>     > > >
>     > > > This email message may contain legally privileged and/or
> confidential information.  If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that
>      any disclosure, copying, distribution, or use of this email message
> is prohibited.  If you have received this message in error, please notify
> the sender immediately by e-mail and delete this email message from your
> computer. Thank you.
>     > >
>     > > --
>     > > Research engineer Pierrick Roger
>     > > http://www.cea-tech.fr |
>     http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
>     > > https://fr.linkedin.com/in/pkrog |
>     https://github.com/pkrog
>     > > In varietate concordia.
>     > >
>     > >
>     > > This email message may contain legally privileged and/or
> confidential information.  If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that
>      any disclosure, copying, distribution, or use of this email message
> is prohibited.  If you have received this message in error, please notify
> the sender immediately by e-mail and delete this email message from your
> computer. Thank you.
>     >
>     > --
>     > Research engineer Pierrick Roger
>     > http://www.cea-tech.fr |
>     http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
>     > https://fr.linkedin.com/in/pkrog |
>     https://github.com/pkrog
>     > In varietate concordia.
>     >
>     >
>     > This email message may contain legally privileged and/or
> confidential information.  If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that
>      any disclosure, copying, distribution, or use of this email message
> is prohibited.  If you have received this message in error, please notify
> the sender immediately by e-mail and delete this email message from your
> computer. Thank you.
>
>     --
>     Research engineer Pierrick Roger
>     http://www.cea-tech.fr |
>     http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
>     https://fr.linkedin.com/in/pkrog |
>     https://github.com/pkrog
>     In varietate concordia.
>
>
>
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>


-- 
Best,
Kasper

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list