[R] embarrassingly parallel problem - simple loop solution

Martin Morgan mtmorgan at fhcrc.org
Fri Jul 11 03:51:12 CEST 2008


Hi Chris --

"Chris Gaiteri" <gaiteri at gmail.com> writes:

> I have an "embarrassingly parallel" routine that I need to run 24000^2/2
> times (based on some microarray data).  All I really need to do is
> parallelize a nested for-loop.  But I haven't found a clear list of what
> packages/commands I'd need to do this.  I've got a dual quad core xeon

Any of snow / Rmpi / nws / rpvm (the former has system requirements,
the latter three additional software requirements) provide the basic
embarrassingly parallel functionality via variants of lapply, e.g.,
mpi.parLapply.

Vectorized ATLAS (search for ATLAS in the R Installation and
Administration Guide) and the experimental package pnmath (see a
thread (oops, pun) starting in June with subject Parallel R, for
instance) provide parallelism at a finer grain, i.e., the level of
linear algebra (ATLAS) or R's math library (pnmath).

> system running RHEL5, so if I could use hyperthreading to increase the
> number of (virtual) nodes that would be great too.

The snow-like solutions allow you to launch as many instances of R as
you like (e.g., one per CPU); each operates quasi-independently. Each
instance of R uses it's own memory, and for big memory problems this
might limit the number of instances per machine.

ATLAS / pnmath make much better use of resources and work without code
modification. But these solutions only provide benefit when the
calculations are appropriately numerical; many calculations are not
formulated in a way that would take advantage of this.

A recent post from Prof. Ripley also mentions the benefits that come
from building R with compiler flags tuned to your chip, but I'm not
able to locate the thread at the moment.

If you're coming at this from scratch, on a Linux-based system, then
snow is probably the easiest to get going, using 'socket'-based
clusters.  I use Rmpi and, to a lesser extent, pnmath. Both at least
in part because I'm interested in the C-level implementations (MPI and
openMP, respectively).

Martin

> Appreciate the help.
>
> Chris
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-help mailing list