[Bioc-devel] new package for accessing some chemical and biological databases
Mike Smith
gr|mbough @end|ng |rom gm@||@com
Fri Sep 13 16:14:53 CEST 2019
I've lost track of whether the infrastructure is actually used, but
certainly some package have a 'longtests' folder e.g.
https://github.com/LTLA/beachmat
On Fri, 13 Sep 2019 at 16:02, Kasper Daniel Hansen <
kasperdanielhansen using gmail.com> wrote:
> We used to have (? or at least discussed the possibility of) occasional
> extensive checking so we could have
> tests
> long_tests
> (names made up).
>
> On Fri, Sep 13, 2019 at 9:50 AM Martin Morgan <mtmorgan.bioc using gmail.com>
> wrote:
>
> > Putting bioc-devel back in the loop.
> >
> > I think that the straight-forward answer to your original query is 'no,
> > git modules are not supported'.
> >
> > I think we'd carry on and say 'packages should be self-contained and
> > conform to the Bioconductor size and time constraints', so you cannot
> have
> > a very large package or a package that takes a long time to check, and
> you
> > can't download part of the package from some alternative source (except
> > perhaps AnnotationHub or ExperimentHub). I agree that the hubs are not
> > suitable for regularly updated files, and that they are meant for
> > biologically motivated rather than purely test-related data resources.
> >
> > While we 'could' make special accommodations on the build systems to
> > support your package, we have found that this is not a fruitful endeavor.
> >
> > A natural place to put files used in tests would be in the /tests
> > directory; these are not included in the installed package. But it seems
> > likely that including your tests would violate the time and / or space
> > limitations we place on packages.
> >
> > It seems likely that this leads to the question you pose below, which is
> > how do you know that you're running on the build system so that you can
> > perform more modest computations? This is similar to here, where special
> > resources are normally required
> >
> > https://stat.ethz.ch/pipermail/bioc-devel/2019-September/015518.html
> >
> > Herve seems not willing to commit to an easy answer, perhaps because this
> > opens the door to people circumventing even minimal tests of their
> > package...
> >
> > Martin
> >
> > On 9/13/19, 7:49 AM, "Shepherd, Lori" <Lori.Shepherd using RoswellPark.org>
> > wrote:
> >
> >
> > I'm including Martin and Herve for their opinions and to chime in too
> > since you took this conversation off the mailing list...
> >
> >
> > Could you please describe what you mean by works transparently?
> >
> >
> > We realize there isn't a function to call - we were suggesting you
> > make a function to call that could be utilized
> >
> >
> > How does your caching system work? I would also advise looking into
> > BiocFileCache - the Bioconductor suggested package for data caching of
> > files.
> >
> >
> >
> >
> > The relevant files to look at for the environment calls can be found
> > https://github.com/Bioconductor/Contributions
> >
> > esp.
> >
> https://github.com/Bioconductor/Contributions#r-cmd-check-environment
> >
> >
> >
> > Please also be mindful of:
> >
> > Submission Guidelines
> > https://bioconductor.org/developers/package-submission/
> >
> > Package Guidelines
> > https://bioconductor.org/developers/package-guidelines/
> >
> >
> >
> >
> > More specifically on the single package builder we use:
> > R CMD BiocCheckGitClone <package>
> > R CMD build --keep-empty-dirs --no-resave-data <package>
> >
> > R CMD check --no-vignettes --timings <package_tar>
> >
> > R CMD BiocCheck --build-output-file=<path to R.out> --new-package
> > <package_tar>
> >
> >
> >
> > With the environment variables set up as described in the above link
> >
> >
> > special files are not encouraged and as far as I am aware not
> > allowed. Herve who has more experience with the builders may be able to
> > chime in further here.
> >
> >
> >
> >
> >
> >
> >
> > Lori Shepherd
> > Bioconductor Core Team
> > Roswell Park Cancer Institute
> > Department of Biostatistics & Bioinformatics
> > Elm & Carlton Streets
> > Buffalo, New York 14263
> >
> >
> > ________________________________________
> > From: Pierrick Roger <pierrick.roger using cea.fr>
> > Sent: Friday, September 13, 2019 2:48 AM
> > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> > Subject: Re: [Bioc-devel] new package for accessing some chemical and
> > biological databases
> >
> > Thank you for the example. However I do not think it is relevant.
> This
> > package has no examples, no tests and just one vignette. The `get`
> > function is part of the interface, so it makes sens to use it inside
> > the vignette. But for my package biodb, there is no function to call,
> > the cache works transparently.
> >
> > Could you please give me more details about the build process of
> > packages in
> > Bioconductor? Are there some environment variables set during the
> build
> > so a package can now it is being built or checked by Bioconductor? If
> > this is the case, maybe I could write a tweak in my code in order to
> > download the cache when needed.
> > If not, would it be possible to have them defined or to have to have
> a
> > special file `bioc.yml` defined at the root of the package in which I
> > could write a `prebuild_step` command for retrieving the cache from
> my
> > public GitHub repos `biodb-cache`?
> >
> > On Thu 12 Sep 19 17:12, Shepherd, Lori wrote:
> > > Please look at SRAdb for an example of how we would recommend
> > keeping the data.
> > >
> > > Summary:
> > > On github or wherever you would like to host and keep the data
> > current, please make sure it is publically accessible. Within your
> package
> > have an download function that retrieves the file from the public
> location.
> > >
> > > Its not recommended but This will be acceptable in this case.
> > >
> > > Thank you.
> > >
> > >
> > > Lori Shepherd
> > >
> > > Bioconductor Core Team
> > >
> > > Roswell Park Cancer Institute
> > >
> > > Department of Biostatistics & Bioinformatics
> > >
> > > Elm & Carlton Streets
> > >
> > > Buffalo, New York 14263
> > >
> > > ________________________________
> > > From: Pierrick Roger <pierrick.roger using cea.fr>
> > > Sent: Thursday, September 12, 2019 10:48 AM
> > > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> > > Subject: Re: [Bioc-devel] new package for accessing some chemical
> > and biological databases
> > >
> > > Examples can be run without the cache, and vignettes can be built
> > > without it too.
> > > In fact, the cache system is part of the package, and can be used
> by
> > the
> > > user or turned off if not wanted or needed. Using the cache avoids
> to
> > > send too many identical requests to the database servers.
> > > So yes users will access the databases directly, and use the cache
> to
> > > speed up their code.
> > >
> > > I use this same cache system also while running `R CMD check` on
> > > Travis-CI for instance, in order to avoid taking too much time with
> > > requests and having errors returned by servers. Servers are not
> > always
> > > stable, and often the `R CMD check` will fail if not using the
> cache.
> > >
> > > On Thu 12 Sep 19 11:36, Shepherd, Lori wrote:
> > > > Would the cache not be a subset of data for using the examples,
> > vigenttes, and tests that could be fairly stable and not necessarily use
> > the updated database or be updated less frequently But wouldn't your
> code
> > and for a users case do the longer process
> > of accessing databases directly? Or was I misunderstanding?
> > > >
> > > >
> > > > Lori Shepherd
> > > >
> > > > Bioconductor Core Team
> > > >
> > > > Roswell Park Cancer Institute
> > > >
> > > > Department of Biostatistics & Bioinformatics
> > > >
> > > > Elm & Carlton Streets
> > > >
> > > > Buffalo, New York 14263
> > > >
> > > > ________________________________
> > > > From: Pierrick Roger <pierrick.roger using cea.fr>
> > > > Sent: Thursday, September 12, 2019 3:18 AM
> > > > To: Shepherd, Lori <Lori.Shepherd using RoswellPark.org>
> > > > Subject: Re: [Bioc-devel] new package for accessing some chemical
> > and biological databases
> > > >
> > > > Thank you for your answer.
> > > > The biodb-cache repository contains 63109 files (484MB).
> > > > Those files change regularly, since output of databases change
> > from time
> > > > to time, and also I add new examples, vignettes and tests.
> > > > Thus it is common that files are removed or updated or that new
> > files
> > > > are added. After reading the ExperimentHub description, it seems
> > to me
> > > > that my usage would not be exactly compatible with its principles
> > and
> > > > definition. Am I wrong?
> > > >
> > > > On Wed 11 Sep 19 11:19, Shepherd, Lori wrote:
> > > > > No we do not allow such submodules currently in Bioconductor.
> > > > >
> > > > > How big is the object? I assume putting the data object in the
> > package increases the package size over the limit?
> > > > >
> > > > > If this is the case, We would recommend storing the data in the
> > ExperimentHub. See [Creating An ExperimentHub package](
> >
> https://bioconductor.org/packages/devel/bioc/vignettes/ExperimentHub/inst/doc/CreateAnExperimentHubPackage.html
> > )
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Lori Shepherd
> > > > >
> > > > > Bioconductor Core Team
> > > > >
> > > > > Roswell Park Cancer Institute
> > > > >
> > > > > Department of Biostatistics & Bioinformatics
> > > > >
> > > > > Elm & Carlton Streets
> > > > >
> > > > > Buffalo, New York 14263
> > > > >
> > > > > ________________________________
> > > > > From: Bioc-devel <bioc-devel-bounces using r-project.org> on behalf
> > of Pierrick Roger <pierrick.roger using cea.fr>
> > > > > Sent: Wednesday, September 11, 2019 5:04 AM
> > > > > To: bioc-devel using r-project.org <bioc-devel using r-project.org>
> > > > > Subject: [Bioc-devel] new package for accessing some chemical
> > and biological databases
> > > > >
> > > > > Dear all,
> > > > >
> > > > > I'd like to submit by package biodb (
> > https://github.com/pkrog/biodb) in the near future.
> > > > > The aim of this package is to present a unified access to
> diverse
> > > > > databases (ChEBI, KEGG databases, Uniprot, ...).
> > > > > For running examples, building vignettes and running tests, I
> > use a
> > > > > cache that is stored in another GitHub repository
> > > > > (https://github.com/pkrog/biodb-cache), and registered as a
> Git
> > submodule of
> > > > > biodb.
> > > > > This cache is currently necessary, since accessing the
> databases
> > during
> > > > > "R CMD check" would lead to some connection errors and would
> > take too
> > > > > much time.
> > > > > I would like to know if this scheme is acceptable for
> > Bioconductor.
> > > > >
> > > > > Best regards,
> > > > > --
> > > > > Research engineer Pierrick Roger
> > > > > http://www.cea-tech.fr |
> > http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> > > > > https://fr.linkedin.com/in/pkrog |
> > https://github.com/pkrog
> > > > > In varietate concordia.
> > > > >
> > > > > _______________________________________________
> > > > > Bioc-devel using r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> > > > >
> > > > >
> > > > > This email message may contain legally privileged and/or
> > confidential information. If you are not the intended recipient(s), or
> the
> > employee or agent responsible for the delivery of this message to the
> > intended recipient(s), you are hereby notified that
> > any disclosure, copying, distribution, or use of this email message
> > is prohibited. If you have received this message in error, please notify
> > the sender immediately by e-mail and delete this email message from your
> > computer. Thank you.
> > > >
> > > > --
> > > > Research engineer Pierrick Roger
> > > > http://www.cea-tech.fr |
> > http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> > > > https://fr.linkedin.com/in/pkrog |
> > https://github.com/pkrog
> > > > In varietate concordia.
> > > >
> > > >
> > > > This email message may contain legally privileged and/or
> > confidential information. If you are not the intended recipient(s), or
> the
> > employee or agent responsible for the delivery of this message to the
> > intended recipient(s), you are hereby notified that
> > any disclosure, copying, distribution, or use of this email message
> > is prohibited. If you have received this message in error, please notify
> > the sender immediately by e-mail and delete this email message from your
> > computer. Thank you.
> > >
> > > --
> > > Research engineer Pierrick Roger
> > > http://www.cea-tech.fr |
> > http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> > > https://fr.linkedin.com/in/pkrog |
> > https://github.com/pkrog
> > > In varietate concordia.
> > >
> > >
> > > This email message may contain legally privileged and/or
> > confidential information. If you are not the intended recipient(s), or
> the
> > employee or agent responsible for the delivery of this message to the
> > intended recipient(s), you are hereby notified that
> > any disclosure, copying, distribution, or use of this email message
> > is prohibited. If you have received this message in error, please notify
> > the sender immediately by e-mail and delete this email message from your
> > computer. Thank you.
> >
> > --
> > Research engineer Pierrick Roger
> > http://www.cea-tech.fr |
> > http://workflow4metabolomics.org <http://workflow4metabolomics.org>
> |
> > http://www.metabohub.fr
> > https://fr.linkedin.com/in/pkrog |
> > https://github.com/pkrog
> > In varietate concordia.
> >
> >
> >
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
>
>
> --
> Best,
> Kasper
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
[[alternative HTML version deleted]]
More information about the Bioc-devel
mailing list