[R-pkg-devel] [Tagged] Re: multithreading in packages

Vladimir Dergachev vo|ody@ @end|ng |rom m|nd@pr|ng@com
Sat Oct 9 20:08:39 CEST 2021



On Sat, 9 Oct 2021, Viechtbauer, Wolfgang (SP) wrote:

> One thing I did not see mentioned in this thread (pun intended) so far:
>
> For what kind of computations is multithreading supposed to be used within the package being developed? If the computations involve a lot of linear/matrix algebra, then one could just use R with other linear algebra routines (e.g., OpenBLAS, Atlas, MKL, BLIS) and get the performance benefits of multicore processing of those computations without having to change a single line of code in the package (although in my experience, most of the performance benefits come from switching to something like OpenBLAS and using it single-threaded).

This is meant for the RMVL package, which memory maps MVL format files for 
direct access. The package also provides database functionality.

The files I am interested in are large. For example, the Gaia DR3 dataset 
is 500GB+.

Plain linear algebra will likely not need multithreading - the computation 
will proceed at the speed of storage I/O (which is quite impressive 
nowadays).

But it will be useful to multithread more involved code that builds or 
queries indices, and I was also thinking of some functions to assist with 
visualization - plot() and xyplot() were not meant for very long vectors.

Ideally, one would be able to explore such large data sets interactively.
And then do more interesting things on the cluster.

>
> This aside, I am personally more in favor of explicitly parallelizing those things that are known to be embarrassingly parallelizable using packages like parallel, future, etc. since a package author should know best when these situations arise and can take the necessary steps to parallelize those computations -- but making the use of parallel processing in these cases an option, not a default. I have seen way too many cases in HPC environments where jobs are being parallelized, the package is doing parallel processing, and multicore linear algebra routines are being used all simultaneously, which is just a disaster.
>
> Finally, I don't think the HPC task view has been mentioned so far:
>
> https://cran.r-project.org/web/views/HighPerformanceComputing.html

Thanks for the link !

I see there is an OpenCL package, very interesting.

best

Vladimir Dergachev

>
> (not even by Dirk just now, who maintains it!)
>
> Best,
> Wolfgang
>
>> -----Original Message-----
>> From: R-package-devel [mailto:r-package-devel-bounces using r-project.org] On Behalf Of
>> Dirk Eddelbuettel
>> Sent: Saturday, 09 October, 2021 18:33
>> To: Ben Bolker
>> Cc: r-package-devel using r-project.org
>> Subject: Re: [R-pkg-devel] [Tagged] Re: multithreading in packages
>>
>>
>> On 9 October 2021 at 12:08, Ben Bolker wrote:
>> |    FWIW there is some machinery in the glmmTMB package for querying,
>> | setting, etc. the number of OpenMP threads.
>> |
>> | https://github.com/glmmTMB/glmmTMB/search?q=omp
>>
>> https://cloud.r-project.org/package=RhpcBLASctl
>>
>> Dirk
>>
>> --
>> https://dirk.eddelbuettel.com | @eddelbuettel | edd using debian.org
>>
>> ______________________________________________
>> R-package-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>
> ______________________________________________
> R-package-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-package-devel
>



More information about the R-package-devel mailing list