[Bioc-devel] new package for accessing some chemical and biological databases
Kasper Daniel Hansen
k@@perd@n|e|h@n@en @end|ng |rom gm@||@com
Fri Sep 13 16:00:51 CEST 2019
We used to have (? or at least discussed the possibility of) occasional
extensive checking so we could have
tests
long_tests
(names made up).
On Fri, Sep 13, 2019 at 9:50 AM Martin Morgan <mtmorgan.bioc using gmail.com>
wrote:
> Putting bioc-devel back in the loop.
>
> I think that the straight-forward answer to your original query is 'no,
> git modules are not supported'.
>
> I think we'd carry on and say 'packages should be self-contained and
> conform to the Bioconductor size and time constraints', so you cannot have
> a very large package or a package that takes a long time to check, and you
> can't download part of the package from some alternative source (except
> perhaps AnnotationHub or ExperimentHub). I agree that the hubs are not
> suitable for regularly updated files, and that they are meant for
> biologically motivated rather than purely test-related data resources.
>
> While we 'could' make special accommodations on the build systems to
> support your package, we have found that this is not a fruitful endeavor.
>
> A natural place to put files used in tests would be in the /tests
> directory; these are not included in the installed package. But it seems
> likely that including your tests would violate the time and / or space
> limitations we place on packages.
>
> It seems likely that this leads to the question you pose below, which is
> how do you know that you're running on the build system so that you can
> perform more modest computations? This is similar to here, where special
> resources are normally required
>
> https://stat.ethz.ch/pipermail/bioc-devel/2019-September/015518.html
>
> Herve seems not willing to commit to an easy answer, perhaps because this
> opens the door to people circumventing even minimal tests of their
> package...
>
> Martin
>
> On 9/13/19, 7:49 AM, "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>
> wrote:
>
>
> I'm including Martin and Herve for their opinions and to chime in too
> since you took this conversation off the mailing list...
>
>
> Could you please describe what you mean by works transparently?
>
>
> We realize there isn't a function to call - we were suggesting you
> make a function to call that could be utilized
>
>
> How does your caching system work? I would also advise looking into
> BiocFileCache - the Bioconductor suggested package for data caching of
> files.
>
>
>
>
> The relevant files to look at for the environment calls can be found
> https://github.com/Bioconductor/Contributions
>
> esp.
> https://github.com/Bioconductor/Contributions#r-cmd-check-environment
>
>
>
> Please also be mindful of:
>
> Submission Guidelines
> https://bioconductor.org/developers/package-submission/
>
> Package Guidelines
> https://bioconductor.org/developers/package-guidelines/
>
>
>
>
> More specifically on the single package builder we use:
> R CMD BiocCheckGitClone <package>
> R CMD build --keep-empty-dirs --no-resave-data <package>
>
> R CMD check --no-vignettes --timings <package_tar>
>
> R CMD BiocCheck --build-output-file=<path to R.out> --new-package
> <package_tar>
>
>
>
> With the environment variables set up as described in the above link
>
>
> special files are not encouraged and as far as I am aware not
> allowed. Herve who has more experience with the builders may be able to
> chime in further here.
>
>
>
>
>
>
>
> Lori Shepherd
> Bioconductor Core Team
> Roswell Park Cancer Institute
> Department of Biostatistics & Bioinformatics
> Elm & Carlton Streets
> Buffalo, New York 14263
>
>
> ________________________________________
> From: Pierrick Roger <pierrick.roger using cea.fr>
> Sent: Friday, September 13, 2019 2:48 AM
> To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> Subject: Re: [Bioc-devel] new package for accessing some chemical and
> biological databases
>
> Thank you for the example. However I do not think it is relevant. This
> package has no examples, no tests and just one vignette. The `get`
> function is part of the interface, so it makes sens to use it inside
> the vignette. But for my package biodb, there is no function to call,
> the cache works transparently.
>
> Could you please give me more details about the build process of
> packages in
> Bioconductor? Are there some environment variables set during the build
> so a package can now it is being built or checked by Bioconductor? If
> this is the case, maybe I could write a tweak in my code in order to
> download the cache when needed.
> If not, would it be possible to have them defined or to have to have a
> special file `bioc.yml` defined at the root of the package in which I
> could write a `prebuild_step` command for retrieving the cache from my
> public GitHub repos `biodb-cache`?
>
> On Thu 12 Sep 19 17:12, Shepherd, Lori wrote:
> > Please look at SRAdb for an example of how we would recommend
> keeping the data.
> >
> > Summary:
> > On github or wherever you would like to host and keep the data
> current, please make sure it is publically accessible. Within your package
> have an download function that retrieves the file from the public location.
> >
> > Its not recommended but This will be acceptable in this case.
> >
> > Thank you.
> >
> >
> > Lori Shepherd
> >
> > Bioconductor Core Team
> >
> > Roswell Park Cancer Institute
> >
> > Department of Biostatistics & Bioinformatics
> >
> > Elm & Carlton Streets
> >
> > Buffalo, New York 14263
> >
> > ________________________________
> > From: Pierrick Roger <pierrick.roger using cea.fr>
> > Sent: Thursday, September 12, 2019 10:48 AM
> > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> > Subject: Re: [Bioc-devel] new package for accessing some chemical
> and biological databases
> >
> > Examples can be run without the cache, and vignettes can be built
> > without it too.
> > In fact, the cache system is part of the package, and can be used by
> the
> > user or turned off if not wanted or needed. Using the cache avoids to
> > send too many identical requests to the database servers.
> > So yes users will access the databases directly, and use the cache to
> > speed up their code.
> >
> > I use this same cache system also while running `R CMD check` on
> > Travis-CI for instance, in order to avoid taking too much time with
> > requests and having errors returned by servers. Servers are not
> always
> > stable, and often the `R CMD check` will fail if not using the cache.
> >
> > On Thu 12 Sep 19 11:36, Shepherd, Lori wrote:
> > > Would the cache not be a subset of data for using the examples,
> vigenttes, and tests that could be fairly stable and not necessarily use
> the updated database or be updated less frequently But wouldn't your code
> and for a users case do the longer process
> of accessing databases directly? Or was I misunderstanding?
> > >
> > >
> > > Lori Shepherd
> > >
> > > Bioconductor Core Team
> > >
> > > Roswell Park Cancer Institute
> > >
> > > Department of Biostatistics & Bioinformatics
> > >
> > > Elm & Carlton Streets
> > >
> > > Buffalo, New York 14263
> > >
> > > ________________________________
> > > From: Pierrick Roger <pierrick.roger using cea.fr>
> > > Sent: Thursday, September 12, 2019 3:18 AM
> > > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> > > Subject: Re: [Bioc-devel] new package for accessing some chemical
> and biological databases
> > >
> > > Thank you for your answer.
> > > The biodb-cache repository contains 63109 files (484MB).
> > > Those files change regularly, since output of databases change
> from time
> > > to time, and also I add new examples, vignettes and tests.
> > > Thus it is common that files are removed or updated or that new
> files
> > > are added. After reading the ExperimentHub description, it seems
> to me
> > > that my usage would not be exactly compatible with its principles
> and
> > > definition. Am I wrong?
> > >
> > > On Wed 11 Sep 19 11:19, Shepherd, Lori wrote:
> > > > No we do not allow such submodules currently in Bioconductor.
> > > >
> > > > How big is the object? I assume putting the data object in the
> package increases the package size over the limit?
> > > >
> > > > If this is the case, We would recommend storing the data in the
> ExperimentHub. See [Creating An ExperimentHub package](
> https://bioconductor.org/packages/devel/bioc/vignettes/ExperimentHub/inst/doc/CreateAnExperimentHubPackage.html
> )
> > > >
> > > >
> > > >
> > > >
> > > > Lori Shepherd
> > > >
> > > > Bioconductor Core Team
> > > >
> > > > Roswell Park Cancer Institute
> > > >
> > > > Department of Biostatistics & Bioinformatics
> > > >
> > > > Elm & Carlton Streets
> > > >
> > > > Buffalo, New York 14263
> > > >
> > > > ________________________________
> > > > From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf
> of Pierrick Roger <pierrick.roger using cea.fr>
> > > > Sent: Wednesday, September 11, 2019 5:04 AM
> > > > To: bioc-devel using r-project.org <bioc-devel using r-project.org>
> > > > Subject: [Bioc-devel] new package for accessing some chemical
> and biological databases
> > > >
> > > > Dear all,
> > > >
> > > > I'd like to submit by package biodb (
> https://github.com/pkrog/biodb) in the near future.
> > > > The aim of this package is to present a unified access to diverse
> > > > databases (ChEBI, KEGG databases, Uniprot, ...).
> > > > For running examples, building vignettes and running tests, I
> use a
> > > > cache that is stored in another GitHub repository
> > > > (https://github.com/pkrog/biodb-cache), and registered as a Git
> submodule of
> > > > biodb.
> > > > This cache is currently necessary, since accessing the databases
> during
> > > > "R CMD check" would lead to some connection errors and would
> take too
> > > > much time.
> > > > I would like to know if this scheme is acceptable for
> Bioconductor.
> > > >
> > > > Best regards,
> > > > --
> > > > Research engineer Pierrick Roger
> > > > http://www.cea-tech.fr |
> http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
> > > > https://fr.linkedin.com/in/pkrog |
> https://github.com/pkrog
> > > > In varietate concordia.
> > > >
> > > > _______________________________________________
> > > > Bioc-devel using r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > > >
> > > >
> > > > This email message may contain legally privileged and/or
> confidential information. If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that
> any disclosure, copying, distribution, or use of this email message
> is prohibited. If you have received this message in error, please notify
> the sender immediately by e-mail and delete this email message from your
> computer. Thank you.
> > >
> > > --
> > > Research engineer Pierrick Roger
> > > http://www.cea-tech.fr |
> http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
> > > https://fr.linkedin.com/in/pkrog |
> https://github.com/pkrog
> > > In varietate concordia.
> > >
> > >
> > > This email message may contain legally privileged and/or
> confidential information. If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that
> any disclosure, copying, distribution, or use of this email message
> is prohibited. If you have received this message in error, please notify
> the sender immediately by e-mail and delete this email message from your
> computer. Thank you.
> >
> > --
> > Research engineer Pierrick Roger
> > http://www.cea-tech.fr |
> http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
> > https://fr.linkedin.com/in/pkrog |
> https://github.com/pkrog
> > In varietate concordia.
> >
> >
> > This email message may contain legally privileged and/or
> confidential information. If you are not the intended recipient(s), or the
> employee or agent responsible for the delivery of this message to the
> intended recipient(s), you are hereby notified that
> any disclosure, copying, distribution, or use of this email message
> is prohibited. If you have received this message in error, please notify
> the sender immediately by e-mail and delete this email message from your
> computer. Thank you.
>
> --
> Research engineer Pierrick Roger
> http://www.cea-tech.fr |
> http://workflow4metabolomics.org <http://workflow4metabolomics.org> |
> http://www.metabohub.fr
> https://fr.linkedin.com/in/pkrog |
> https://github.com/pkrog
> In varietate concordia.
>
>
>
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
--
Best,
Kasper
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list