[Bioc-devel] External dependencies and reproducibility in all platforms

Hervé Pagès hp@ge@@on@g|thub @end|ng |rom gm@||@com
Tue Aug 24 01:53:57 CEST 2021


On 23/08/2021 16:35, Fabricio de Almeida wrote:
> Hi, Hervé.
> 
> 
> Thank you for making this clear to me. I will try to think of an optimal 
> solution for this. The issue here is that my package works as the 
> pipeline itself, similarly to how ORFik works.
> 
> Out of curiosity, I just checked how ORFik and KnowSeq handle this 
> situation:
> 
>   * for STAR, for instance, ORFik simply comments the function that runs
>     STAR in @examples
>     (https://github.com/Roleren/ORFik/blob/master/R/STAR.R
>     <https://github.com/Roleren/ORFik/blob/master/R/STAR.R>). Quite a
>     hacky solution to avoid the overuse of \donttest{}.
>   * KnowSeq includes a function to download all external software
>     (https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R
>     <https://github.com/CasedUgr/KnowSeq/blob/75d5d9f526f5b4ac561455a46884fe0a1860ffa0/R/sraToFastq.R>),
>     and it includes \donttest{} in some functions. 
> 
> 
> I will see if I can include \donttest{} in as many functions with 
> external dependencies as I can and add some other dependencies in 
> SystemRequirements to satisfy the 80% testable code in @examples.

We discourage this approach because it generally hurts reproducibility 
and reliability of the software. It's unfortunate that other packages 
are doing this.

A better approach is to make sure that all the steps in your pipeline 
are automatically tested on a regular basis, even if that means that we 
must install more things on the build machines. As long as these things 
are easy to install (e.g. a simple 'apt-get install mafft' on Ubuntu) we 
should be fine. Things might be a little bit more complicated on other 
platforms, in which case you may need to consider disabling some 
examples and/or tests on these platforms. But that should be the last 
resort.

Hope this makes sense.

Thanks,
H.


> 
> 
> Best,
> 
> /=========================/
> 
> /
> /
> 
> /Fabrício de Almeida Silva/
> 
> /Undergraduate degree in Biological Sciences (UENF)/
> 
> /MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)/
> 
> /Laboratório de Química e Função de Proteínas e Peptídeos 
> (LQFPP/CBB/UENF - RJ/Brazil)/
> 
> /Personal website: /https://almeidasilvaf.github.io
> 
> 
> ------------------------------------------------------------------------
> *De:* Hervé Pagès <hpages.on.github using gmail.com>
> *Enviado:* segunda-feira, 23 de agosto de 2021 16:57
> *Para:* Fabricio de Almeida <fabricio_almeidasilva using hotmail.com>; 
> bioc-devel using r-project.org <bioc-devel using r-project.org>
> *Assunto:* Re: [Bioc-devel] External dependencies and reproducibility in 
> all platforms
> Hi Fabricio,
> 
> If your package requires external software/libraries/tools in order to
> pass 'R CMD build' and 'R CMD check', then please list them in the
> SystemRequirements field of your DESCRIPTION file. In addition, we
> kindly ask you to provide an INSTALL file in the top-level folder of
> your package source tree that documents how to install these external
> deps on all the supported platforms.
> 
> BTW I'm not sure that KnowSeq or ORFik have external system
> requirements. I don't see that they have a SystemRequirements field.
> Only openPrimeR has one but it's not clear to me that the package
> actually needs all the things listed there e.g. for example MAFFT is
> listed but we don't have it on the build machines.
> 
> FWIW most packages avoid having to depend on external tools like
> SRAtoolkit, STAR or salmon by assuming that this step of the analysis
> was already taken care of, and by focusing on the downstream analysis.
> These packages often include the output of the upstream analysis as a
> small dataset and start from there.
> 
> Hope this helps,
> 
> Best,
> H.
> 
> 
> On 23/08/2021 07:10, Fabricio de Almeida wrote:
>> Dear Bioc developers,
>> 
>> I am writing a package that contains external dependencies, and I'd like to know what are the best practices to submit this kind of package to Bioconductor.
>> 
>> The external dependencies are standard RNA-seq analysis algorithms, such as SRAtoolkit, STAR and salmon. I have seen other Bioc packages with external dependencies, such as KnowSeq (https://bioconductor.org/packages/release/bioc/html/KnowSeq.html 
> <https://bioconductor.org/packages/release/bioc/html/KnowSeq.html>), 
> ORFik 
> (https://www.bioconductor.org/packages/release/bioc/html/ORFik.html 
> <https://www.bioconductor.org/packages/release/bioc/html/ORFik.html>), 
> and openPrimeR 
> (https://bioconductor.org/packages/release/bioc/html/openPrimeR.html 
> <https://bioconductor.org/packages/release/bioc/html/openPrimeR.html>), 
> but it is not clear how they handle the dependencies in the Bioconductor 
> build system.
>> 
>> I have a conda environment containing all the dependencies + R 4.1.0, which works fine. However, conda is not the best option, as some dependencies may not exist in all OS, particularly in Windows.
>> 
>> Perhaps a Docker container with the dependencies in an Ubuntu OS would ensure reproducibility in all platforms, but what should I do for the package to pass all checks in the Bioc build system?
>> 
>> Any help is appreciated.
>> 
>> Best,
>> 
>> 
>> =========================
>> 
>> 
>> Fabr�cio de Almeida Silva
>> 
>> Undergraduate degree in Biological Sciences (UENF)
>> 
>> MSc. candidate in Plant Biotechnology (PGBV/UENF - RJ/Brazil)
>> 
>> Laborat�rio de Qu�mica e Fun��o de Prote�nas e Pept�deos (LQFPP/CBB/UENF - RJ/Brazil)
>> 
>> Personal website: https://almeidasilvaf.github.io <https://almeidasilvaf.github.io>
>> 
>> 
>>        [[alternative HTML version deleted]]
>> 
>> 
>> _______________________________________________
>> Bioc-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel 
> <https://stat.ethz.ch/mailman/listinfo/bioc-devel>
>> 
> 
> -- 
> Hervé Pagès
> 
> Bioconductor Core Team
> hpages.on.github using gmail.com

-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.github using gmail.com



More information about the Bioc-devel mailing list