[R-sig-hpc] an opinion question
Paul Johnson
pauljohn32 at gmail.com
Sun Feb 5 23:28:29 CET 2012
On Sat, Feb 4, 2012 at 5:25 PM, Hodgess, Erin <HodgessE at uhd.edu> wrote:
> Hi everyone!
>
> Here is an opinion question please: when using R on a cluster, what is the best way to start please?
>
Hi, Erin
The answer will depend on the kind of cluster you have--the
interconnection technology.
Ours used MPI (the OpenMPI libraries) and Rmpi package. On top of
Rmpi, come various
facilitator packages like R's own new "parallel" (an adaption of some
parts of snow), the
separate snow package, and convenience tools like snowFT or doParallel.
I've felt that the best thing to do when getting started is to work
with Rmpi itself, because
errors are more likely to be understandable. But the proponents of
snowFT argue that
errors are less likely if you follow their advice.
I'm accumulating lessons from the school of crashed programs here:
http://web.ku.edu/~quant/cgi-bin/mw1/index.php?title=Cluster:Main
That refers to a collection of "working examples" of these, and you
can do me a favor if you
check them over and give me feedback on what is clear or unclear. For
me, the most
difficult thing has been understanding where the work of the OS, the
cluster framework
and R, divide from each other. But I'm getting closer to having
reasonable writeups for several.
The list of all the examples is just a Subversion source directory listing,
http://winstat.quant.ku.edu/svn/hpcexample/trunk/
I would like to insert a "roadmap" message at the top of that page,
but I have to do some
work with the web server before that is allowed.
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex50-R-serial/
Runs one R job (a single program) on the cluster
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex51-R-ManySerialJobs/
Sends many separate R jobs out into the cluster
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex53-HelloWorldRmpi/
Basics of Rmpi usage
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex60-HelloWorldSnow/
Shows similar with the snow package
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex61-HelloWorldSnowFT/
If you wonder what snowFT does differently, see the README
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex65-R-parallel/
R 2.14 introduced the parallel package and this tests that out.
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex66-ParallelSeedPrototype/
Do you need separate seeds within each run of a simulation? This helps by
creating a seed archive file that the repetitions can draw on.
http://winstat.quant.ku.edu/svn/hpcexample/trunk/Ex80-PrevSci2007/
This is inspired by a negative reaction I had to a published paper.
I'll replicate that paper, see what's right, what's wrong.
It has notes and advice for my class about re-designing an ordinary
"run in one system" R simulation
program into a "run across the cluster" program. This steps through versions.
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-sig-hpc
mailing list