[Bioc-devel] C++ parallel computing

Aaron Lun |n||n|te@monkey@@w|th@keybo@rd@ @end|ng |rom gm@||@com
Wed May 26 09:29:16 CEST 2021


Incidentally, I was reflecting on this topic the other day and was 
wondering whether BiocParallel could have something like OpenMPParam() 
that sets the number of threads to some non-zero value via 
omp_set_num_threads(). This would provide a consistent framework through 
which users could control OpenMP behavior in suitably written functions.

One could even imagine having a composition design where a caller could 
assemble a BPPARAM object like:

bplapply(..., BPPARAM=OpenMPParam(SnowParam(5), 2))

which tells bplapply to spin up 5 workers where each worker is allowed 
to use up to 2 threads each. Implementation-wise, it would be a 
relatively simple matter of stuffing an extra set-up command into 
.composeTry; the nthread-setting code can be borrowed from ShortRead.

For context: I am planning on moving more parallelization in my packages 
into OpenMP to get around the overhead of the other backends. Forking is 
the only approach that is remotely fast enough, but the interaction of 
forks with the GC is too chaotic in memory-limited environments.

-A

On 5/25/21 10:39 AM, Martin Morgan wrote:
> If the BAM files are each processed independently, and each processing task takes a while, then it is probably 'good enough' to use R-level parallel evaluation using BiocParallel (currently the recommendation for Bioconductor packages) or other evaluation framework. Also, presumably you will use Rhtslib, which provides C-level access to the hts library. This will requiring writing C / C++ code to interface between R and the hts library, and will of course be a significant underataking.
> 
> It might be worth outlining in a bit more detail what your task is and how (not too much detail!) you've tried to implement this in Rsamtools.
> 
> Martin Morgan
> 
> On 5/24/21, 10:01 AM, "Bioc-devel on behalf of Oleksii Nikolaienko" <bioc-devel-bounces using r-project.org on behalf of oleksii.nikolaienko using gmail.com> wrote:
> 
>      Dear Bioc team,
>      I'd like to ask for your advice on the parallelization within a Bioc
>      package. Please point me to a better place if this mailing list is not
>      appropriate.
>      After a bit of thinking I decided that I'd like to parallelize processing
>      at the level of C++ code. Would you strongly recommend not to and use an R
>      approach instead (e.g. "future")?
>      If parallel C++ is ok, what would be the best solution for all major OSs?
>      My initial choice was OpenMP, but then it seems that Apple has something
>      against it (https://mac.r-project.org/openmp/). My own dev environment is
>      mostly Big Sur/ARM64, but I wouldn't want to drop its support anyway.
> 
>      (On the actual task: loading and specific processing of very large BAM
>      files, ideally significantly faster than by means of Rsamtools as a backend)
> 
>      Best,
>      Oleksii Nikolaienko
> 
>      	[[alternative HTML version deleted]]
> 
>      _______________________________________________
>      Bioc-devel using r-project.org mailing list
>      https://stat.ethz.ch/mailman/listinfo/bioc-devel
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>



More information about the Bioc-devel mailing list