[Bioc-devel] External dependencies and reproducibility in all platforms

Fabricio de Almeida |@br|c|o_@|me|d@@||v@ @end|ng |rom hotm@||@com
Tue Aug 24 02:05:20 CEST 2021


Thank you for the suggestions, Hervé.

Indeed, the best thing to do is to document everything. I was considering using {basilisk} or {herper} to keep a conda environment for functions that depend on external software, but I think they are made for Python code only via reticulate.

Is there a fast way to see all software installed in the Bioc build system?


Best,


=========================


Fabrício de Almeida Silva

Undergraduate degree in Biological Sciences (UENF)

MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)

Laboratório de Química e Função de Proteínas e Peptídeos (LQFPP/CBB/UENF - RJ/Brazil)

Personal website: https://almeidasilvaf.github.io

________________________________
De: Hervé Pagès <hpages.on.github using gmail.com>
Enviado: segunda-feira, 23 de agosto de 2021 18:53
Para: Fabricio de Almeida <fabricio_almeidasilva using hotmail.com>; bioc-devel using r-project.org <bioc-devel using r-project.org>
Assunto: Re: [Bioc-devel] External dependencies and reproducibility in all platforms

On 23/08/2021 16:35, Fabricio de Almeida wrote:
> Hi, Hervé.
>
>
> Thank you for making this clear to me. I will try to think of an optimal
> solution for this. The issue here is that my package works as the
> pipeline itself, similarly to how ORFik works.
>
> Out of curiosity, I just checked how ORFik and KnowSeq handle this
> situation:
>
>   * for STAR, for instance, ORFik simply comments the function that runs
>     STAR in @examples
>     (https://github.com/Roleren/ORFik/blob/master/R/STAR.R
>     <https://github.com/Roleren/ORFik/blob/master/R/STAR.R>). Quite a
>     hacky solution to avoid the overuse of \donttest{}.
>   * KnowSeq includes a function to download all external software
>     (https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R
>     <https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R>),
>     and it includes \donttest{} in some functions.
>
>
> I will see if I can include \donttest{} in as many functions with
> external dependencies as I can and add some other dependencies in
> SystemRequirements to satisfy the 80% testable code in @examples.

We discourage this approach because it generally hurts reproducibility
and reliability of the software. It's unfortunate that other packages
are doing this.

A better approach is to make sure that all the steps in your pipeline
are automatically tested on a regular basis, even if that means that we
must install more things on the build machines. As long as these things
are easy to install (e.g. a simple 'apt-get install mafft' on Ubuntu) we
should be fine. Things might be a little bit more complicated on other
platforms, in which case you may need to consider disabling some
examples and/or tests on these platforms. But that should be the last
resort.

Hope this makes sense.

Thanks,
H.


>
>
> Best,
>
> /=========================/
>
> /
> /
>
> /Fabrício de Almeida Silva/
>
> /Undergraduate degree in Biological Sciences (UENF)/
>
> /MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)/
>
> /Laboratório de Química e Função de Proteínas e Peptídeos
> (LQFPP/CBB/UENF - RJ/Brazil)/
>
> /Personal website: /https://almeidasilvaf.github.io
>
>
> ------------------------------------------------------------------------
> *De:* Hervé Pagès <hpages.on.github using gmail.com>
> *Enviado:* segunda-feira, 23 de agosto de 2021 16:57
> *Para:* Fabricio de Almeida <fabricio_almeidasilva using hotmail.com>;
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Assunto:* Re: [Bioc-devel] External dependencies and reproducibility in
> all platforms
> Hi Fabricio,
>
> If your package requires external software/libraries/tools in order to
> pass 'R CMD build' and 'R CMD check', then please list them in the
> SystemRequirements field of your DESCRIPTION file. In addition, we
> kindly ask you to provide an INSTALL file in the top-level folder of
> your package source tree that documents how to install these external
> deps on all the supported platforms.
>
> BTW I'm not sure that KnowSeq or ORFik have external system
> requirements. I don't see that they have a SystemRequirements field.
> Only openPrimeR has one but it's not clear to me that the package
> actually needs all the things listed there e.g. for example MAFFT is
> listed but we don't have it on the build machines.
>
> FWIW most packages avoid having to depend on external tools like
> SRAtoolkit, STAR or salmon by assuming that this step of the analysis
> was already taken care of, and by focusing on the downstream analysis.
> These packages often include the output of the upstream analysis as a
> small dataset and start from there.
>
> Hope this helps,
>
> Best,
> H.
>
>
> On 23/08/2021 07:10, Fabricio de Almeida wrote:
>> Dear Bioc developers,
>>
>> I am writing a package that contains external dependencies, and I'd like to know what are the best practices to submit this kind of package to Bioconductor.
>>
>> The external dependencies are standard RNA-seq analysis algorithms, such as SRAtoolkit, STAR and salmon. I have seen other Bioc packages with external dependencies, such as KnowSeq (https://bioconductor.org/packages/release/bioc/html/KnowSeq.html
> <https://bioconductor.org/packages/release/bioc/html/KnowSeq.html>),
> ORFik
> (https://www.bioconductor.org/packages/release/bioc/html/ORFik.html
> <https://www.bioconductor.org/packages/release/bioc/html/ORFik.html>),
> and openPrimeR
> (https://bioconductor.org/packages/release/bioc/html/openPrimeR.html
> <https://bioconductor.org/packages/release/bioc/html/openPrimeR.html>),
> but it is not clear how they handle the dependencies in the Bioconductor
> build system.
>>
>> I have a conda environment containing all the dependencies + R 4.1.0, which works fine. However, conda is not the best option, as some dependencies may not exist in all OS, particularly in Windows.
>>
>> Perhaps a Docker container with the dependencies in an Ubuntu OS would ensure reproducibility in all platforms, but what should I do for the package to pass all checks in the Bioc build system?
>>
>> Any help is appreciated.
>>
>> Best,
>>
>>
>> =========================
>>
>>
>> Fabr�cio de Almeida Silva
>>
>> Undergraduate degree in Biological Sciences (UENF)
>>
>> MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)
>>
>> Laborat�rio de Qu�mica e Fun��o de Prote�nas e Pept�deos (LQFPP/CBB/UENF - RJ/Brazil)
>>
>> Personal website: https://almeidasilvaf.github.io <https://almeidasilvaf.github.io>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>>
>
> --
> Hervé Pagès
>
> Bioconductor Core Team
> hpages.on.github using gmail.com

--
Hervé Pagès

Bioconductor Core Team
hpages.on.github using gmail.com

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list