[R-pkg-devel] multithreading in packages

Vladimir Dergachev vo|ody@ @end|ng |rom m|nd@pr|ng@com
Sat Oct 9 16:34:55 CEST 2021



On Sat, 9 Oct 2021, Ivan Krylov wrote:

> В Thu, 7 Oct 2021 21:58:08 -0400 (EDT)
> Vladimir Dergachev <volodya using mindspring.com> пишет:
>
>>    * My understanding from reading documentation and source code is
>> that there is no dedicated support in R yet, but there are packages
>> that use multithreading. Are there any plans for multithreading
>> support in future R versions ?
>
> Shared memory multithreading is hard to get right in a memory-safe
> language (e.g. R), but there's the parallel package, which is a part of
> base R, which offers process-based parallelism and may run your code on
> multiple machines at the same time. There's no communication _between_
> these machines, though. (But I think there's an MPI package on CRAN.)

Well, the way I planned to use multitheading is to speedup processing of 
very large vectors, so one does not have to wait seconds for the command 
to return. Same could be done for many built-in R primitives.

>
>>    * pthread or openmp ? I am particularly concerned about
>> interaction with other packages. I have seen that using pthread and
>> openmp libraries simultaneously can result in incorrectly pinned
>> threads.
>
> pthreads-based code could be harder to run on Windows (which is a
> first-class platform for R, expected to be supported by most packages).

Gábor Csárdi pointed out that R is compiled with mingw on Windows and 
has pthread support - something I did not know either.

> OpenMP should be cross-platform, but Apple compilers are sometimes
> lacking; the latest Apple likely has been solved since I've heard about
> it. If your problem can be made embarrassingly parallel, you're welcome
> to use the parallel package.

I used parallel before, it is very nice, but R-level only. I am looking 
for something to speedup response of individual package functions so they 
themselves can be used of part of more complicated code.

>
>>    * control of maximum number of threads. One can default to openmp
>> environment variable, but these might vary between openmp
>> implementations.
>
> Moreover, CRAN-facing tests aren't allowed to consume more than 200%
> CPU, so it's a good idea to leave the number of workers in control of
> the user. According to a reference guide I got from openmp.org, OpenMP
> implementations are expected to understand omp_set_num_threads() and
> the OMP_NUM_THREADS environment variable.

Oh, this would never be run through CRAN tests, it is meant for data that 
is too big for CRAN.

I seem to remember that the Intel compiler used a different environmental 
variable, but it could be this was fixed since the last time I used it.

best

Vladimir Dergachev

>
> -- 
> Best regards,
> Ivan
>


More information about the R-package-devel mailing list