[R-sig-hpc] an opinion question

Paul Johnson pauljohn32 at gmail.com
Sun Feb 5 23:28:29 CET 2012


On Sat, Feb 4, 2012 at 5:25 PM, Hodgess, Erin <HodgessE at uhd.edu> wrote:
> Hi everyone!
>
> Here is an opinion question please:  when using R on a cluster, what is the best way to start please?
>

Hi, Erin

The answer will depend on the kind of cluster you have--the
interconnection technology.

Ours used MPI (the OpenMPI libraries) and Rmpi package.  On top of
Rmpi, come various
facilitator packages like R's own new "parallel" (an adaption of some
parts of snow), the
separate snow package, and convenience tools like snowFT or doParallel.

I've felt that the best thing to do when getting started is to work
with Rmpi itself, because
errors are more likely to be understandable.  But the proponents of
snowFT argue that
errors are less likely if you follow their advice.

I'm accumulating lessons from the school of crashed programs here:

http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Cluster:Main

That refers to a collection of "working examples" of these, and you
can do me a favor if you
check them over and give me feedback on what is clear or unclear.  For
me, the most
difficult thing has been understanding where the work of the OS, the
cluster framework
and R, divide from each other. But I'm getting closer to having
reasonable writeups for several.


The list of all the examples is just a Subversion source directory listing,
http://winstat.quant.ku.edu/svn/hpcexample/trunk/

I would like to insert a "roadmap" message at the top of that page,
but I have to do some
work with the web server before that is allowed.

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex50-R-serial/
Runs one R  job (a single program) on the cluster

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex51-R-ManySerialJobs/
Sends many separate R jobs out into the cluster

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex53-HelloWorldRmpi/
Basics of Rmpi usage

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex60-HelloWorldSnow/
Shows similar with the snow package

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex61-HelloWorldSnowFT/
If you wonder what snowFT does differently, see the README

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex65-R-parallel/
R 2.14 introduced the parallel package and this tests that out.

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/
Do you need separate seeds within each run of a simulation? This helps by
creating a seed archive file that the repetitions can draw on.

http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex80-PrevSci2007/
This is inspired by a negative reaction I had to a published paper.
I'll replicate that paper, see what's right, what's wrong.
It has notes and advice for my class about re-designing an ordinary
"run in one system" R simulation
program into a "run across the cluster" program.  This steps through versions.


-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas



More information about the R-sig-hpc mailing list