[R] parallel computing

McGehee, Robert Robert.McGehee at geodecapital.com
Thu May 25 19:57:19 CEST 2006


Moreno,
As much of my processor time is often spent doing basic linear algebra
operations (matrix inversion, quadratic programming, etc), I recently
recompiled R using a BLAS implementation (ATLAS) tuned for parallel
processing. The speed improvement for linear algebra operations was
significant on multi-processors.

For example, using:
system.time(x <- replicate(10, matrix(rnorm(N^2), N, N) %*%
matrix(rnorm(N^2), N, N)))

I benchmarked speed improvements of 10-20% where N is small (10-100) and
speed improvements of up to 6x (e.g. 8 seconds vs 48 seconds) when N is
large (1000+).

So for users with lots of linear algebra calculations interested in
parallel processing, I'd recommend always starting with (re-)compiling a
customized BLAS, if they have not done so already. ATLAS and GOTO are
the two most common BLAS implementations that I know of.

As far as true parallel processing, I have not yet tried the
before-mentioned R packages, but I did code up an internal package for
parallel processing very large simulations in which a simple script is
re-run on multiple data sets. In this example I stored each data set in
a different numbered directory. The R script would go through each
directory, in order, looking for a flag.txt file. If such a file does
not exist, the processor puts a flag.txt in that directory, indicating
that that directory is in use, and starts processing the data. This
allows multiple processors/computers to work on very large simulations
in parallel without duplicating work. At one point I was able to muster
up 15-20 CPUs from spare Windows and Linux boxes to reduce the
simulation time down from days to hours. Such a system would be also be
easy to re-create without setting up MPI/PVM if your simulation /
project can be divided up in a similar way.

Cheers,
Robert


-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Martin Morgan
Sent: Thursday, May 25, 2006 1:17 PM
To: mb7312 at libero.it
Cc: r-help
Subject: Re: [R] parallel computing

Hi Moreno --

snow provides an easy interface to simple parallel types of
calculations (e.g., lapply in parallel). I quickly wanted to have more
direct control over how parallel computations were calculated, and
have been using Rmpi. Though in principle snow and Rmpi are 'easy' to
use, I found that they actually require a certain amount of
understanding about R objects and evaluation, and the underlying
communication library (MPI, or PVM).

Hope that helps,

Martin

"mb7312 at libero.it" <mb7312 at libero.it> writes:

> Dear R users,
>
> I have access to a Sun cluster with multiple processors , a lot of
> RAM and with RedHat installed.  I want to take advantage of its
> power for a R routine very time consuming.
>
> Whick package do I have to use? I know there are snow,snowFT and
> others package.Which is the best for my purpose?  Do someone have
> experiences with this?
>
> Thanck in advance.
>
> Moreno
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html



More information about the R-help mailing list