[R] R and Multi threading

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 8 09:50:43 CEST 2008


On Wed, 8 Oct 2008, jgarcia at ija.csic.es wrote:

> Dear prof, and list!
>
> I'm wondering which are the steps to exploit multiple processors/cores if
> most of the processing time is due to C code dynamically loaded into R. I
> mean; e.g., a Monte Carlo analysis calls the C part a huge number of
> times, and it is this C part which takes most of the time.

But you may well be able to do those parts in parallel.  It depends on how 
the MCMC algorithm is organized.

> Will snow be anyway useful for this, or multithreading must be made
> explicit (I don't know how) within the C code, or there is nothing we can
> do?

Please do your own homeork on what snow (etc) do, and how multithreaded 
BLAS work (and the ones I am familiar with are C code and use pthreads -- 
OpenMP is another possibility).

Parallelization is (in general) hard and demands detailed understanding of 
the algorithms used (and of alternative algorithms).  For example, the 
early 1990s debate on single vs multiple runs for MCMC was all about a 
single CPU, and the conclusions will be different if many CPUs are 
available at no extra cost.

>
> Javier G.P
> ----
>
>
>> On Tue, 7 Oct 2008, pejpm wrote:
>>
>>>
>>> I will preface this message by saying that I am not an R developer and
>>> no
>>> very little about R...but here is my situation:
>>>
>>> One of my users has developed a model for analysing commodity prices. At
>>> the
>>> moment when he runs this model on his daily data set it takes roughly 5
>>> hours to complete. He is using a quad core PC with 2gb of RAM. The R
>>> process
>>> only uses 1 core..i.e. the overall CPU usage tops out at around 25%.
>>> This
>>> has been a managable situation for a while, but he would now like to run
>>> this model on 5 years of historical data. He has a colleague who ran the
>>> model on a 16 core Redhat Linux box, but it took even longer to run. He
>>> has
>>> asked me for assistance in speeding up this process. I have a couple of
>>> questions:
>>>
>>> 1) Is is possible to run the Windows version of R across all four
>>> processors?
>>
>> No.
>>
>>> 2) I was under the impression that R for Linux supported multi-threading
>>> by
>>> default. Am I correct in this assumption? If not, is it possible for
>>> Linux R
>>> to multi thread, and how do I go about configuring this?
>>
>> Your impression/assumption is wrong.
>>
>>> Apologies for the lack of detailed info in this post. I work in trade
>>> floor
>>> support and engineering and we dont really have much demand for this
>>> kind of
>>> heavy duty computational work so I am learning as I investigate this
>>> issue.
>>
>> R runs as a single task.  It is possible that some of the the support
>> functions (notably the BLAS) can be multithreaded, and this will often
>> (but not always) help if the task is intensive numerical linear algebra.
>> But even if a multithreaded BLAS is used (and it is not the default
>> build), the effect on a typical R task is very small.
>>
>> If you want to exploit multiple processors/cores you need to split up your
>> R job amongst multiple processes.  There are ways to help you do that
>> (packages snow and Rmpi, amongst others), but they need recoding of the
>> job to make use of them.
>>
>> --
>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford,             Tel:  +44 1865 272861 (self)
>> 1 South Parks Road,                     +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list