[R-sig-hpc] Followup about mpirun and R interaction
Paul Johnson
pauljohn32 at gmail.com
Mon Mar 29 21:36:34 CEST 2010
We run Rocks Cluster Linux with the Torque scheduler.
I've been testing various ways of running R with MPI. While
Googling, I've found a countless variety of different ways of
submitting R batch jobs on a cluster, whether with R itself, Rscript,
or littler. I've been puzzled that some submission scripts using
Openmpi, for example, will not include "mpirun" or "orterun" in the
submission command. But some do.
While experimenting with R snow and doMPI packages this weekend, I've
formed a hunch and I wonder if you think it might be correct. I'll
post the full working examples on my website once I think I've got
this mostly not-wrong.
Here's a submit script I use on a system where Rmpi is compiled
against OpenMPI-1.4.1.
============================================
#!/bin/sh
#
#This is an example script example.sh
#
#These commands set up the Grid Environment for your job:
#PBS -N SnowHelloWorld
#PBS -l nodes=10:ppn=4
#PBS -l walltime=00:10:00
#PBS -M pauljohn at ku.edu
#PBS -m bea
cd $PBS_O_WORKDIR
### THIS WORKS:
###orterun -np 1 R --no-save --vanilla -f snow-hello.R
### And this also works:
R --no-save -vanilla -f snow-hello.R
### THIS FAILS:
### orterun R --no-save --vanilla -f snow-hello.R
============================================
The output from the first two is essentially the same. A job gets sent
to the compute nodes, the nodes report back, and the output is
collected up.
My theory is that when I have the "-np 1" option on orterun, it
basically has "no effect" because orterun is not asked to do any of
the work it would usually do if it were launched with lots of
processors. Inside the script, in the example below using snow, the
makeCluster() command spawns out the work.
If I don't put the "-np1" option in the orterun, command, the program
crashes. I do see a lot of output about starting R on lots and lots
of systems, but there's a crash that is hard to diagnose (still trying
to figure it out).
If I'm not using snow, but rather just plain "bare" Rmpi, the program
runs with the "-np1" option for orterun, but when I take that out,
then there is an error file generated for each compute node. They have
names like this:
compute-0-9.5393+1.30005.log
compute-0-4.29848+1.27497.log
compute-0-9.5395+1.29951.log
Error in f(libname, pkgname) : ignoring SIGPIPE signal
Warning message:
Rmpi cannot be loaded
I'm attaching the examples for these now, just for the record.
Mostly I'm wondering if I'm alone in my puzzlement over how R and MPI
connect to each other in a submission script.
Or, if you do use orterun with more than 1 process in your submission
script, can I see what your R script?
pj
============Example program using snow=======================
$ cat snow-hello.R
### Paul Johnson
### 2010-03-25
### Demonstration of SNOW "Simple Network of Workstations" using MPI
### "Message Passing Interface" (OpenMPI implementation)
library(snow)
p <- rnorm(123, m=33)
cl <- makeCluster(107, type="MPI")
### sends function to each system
clusterCall( cl, function() Sys.info()[c("nodename","machine")])
clusterCall( cl, function() rnorm(1, 33,1 ) )
myNorms <- matrix( rnorm(1000), ncol=10 )
## goes column by column
mypapply <- parApply(cl, myNorms, 2, print )
attributes(mypapply)
mypapply <- parApply(cl, myNorms, 2, mean )
mypapply
myNorms <- matrix( rnorm(10000), ncol=100)
mySum <- function( v ){
s <-Sys.info()[c("nodename")]
ms <- sum(v)
list(s, ms)
}
mypcapply <- parApply(cl, myNorms, 2, mySum)
mypcapply
myNorms <- matrix(rnorm(250000), ncol=250)
myMeans <- function(v){
s <- Sys.info()[c("nodename")]
ms <- mean(v)
list(s, ms)
}
mypcapply <- parApply(cl, myNorms, 2, myMeans )
mypcapply
stopCluster(cl)
mpi.quit()
=================================================
Example that uses Rmpi:
===================$ cat sub-hello.sh
#!/bin/sh
#
#This is an example script example.sh
#
#These commands set up the Grid Environment for your job:
#PBS -N RmpiHelloWorld
#PBS -l nodes=10:ppn=4
#PBS -l walltime=00:10:00
#PBS -M pauljohn at ku.edu
#PBS -m bea
#PBS -q default
cd $PBS_O_WORKDIR
## Works
##orterun -np 1 R --vanilla --no-save -f mpi-hello.R
###Fails
###orterun R --vanilla --no-save -f mpi-hello.R
## Works:
R --vanilla -no-save -f mpi-hello.R
===============================
===================$ cat mpi-hello.R
if (!is.loaded("mpi_intitialize")){
library(Rmpi)
}
## see http://math.acadiau.ca/ACMMaC/Rmpi/sample.html
# Spawn as many slaves as possible mpi.spawn.Rslaves()
### Try worker processes; set to smaller number
### to demonstrate the load balancing option below
mpi.spawn.Rslaves(nslaves=18)
### This will take as many nodes are there are nodes x np
## mpi.spawn.Rslaves()
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function(){
if (is.loaded("mpi_initialize")){
if (mpi.comm.size(1) > 0){
print("Please use mpi.close.Rslaves() to close slaves.")
mpi.close.Rslaves()
}
print("Please use mpi.quit() to quit R")
.Call("mpi_finalize")
}
}
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
getSampleMean <- function(n=100, m=44, sd=33){
mysys <- Sys.info()[c("nodename")]
x <- rnorm(n=n, m=m, sd=sd)
mx <- mean(x)
list(mysys, x, mx)
}
mpimeans <- mpi.apply(1:17, getSampleMean, n=100, m=33)
mpimeans
### following should cause error, asks for too many nodes
##mpimeans2 <- mpi.apply(1:100, getSampleMean)
##mpimeans2
### That error kills the whole program if in batch mode
## more concise output
getSampleMean <- function(n=1000, m=88, sd=18){
mysys <- Sys.info()[c("nodename")]
x <- rnorm(n=n, m=m, sd=sd)
mx <- mean(x)
sdx <- sd(x)
c(mysys, length(x), mx, sdx)
}
### LB "Load Balancing" handles it
mpimeans3 <- mpi.applyLB(1:100, getSampleMean, n=1500, m=66)
mpimeans3
mpi.close.Rslaves()
mpi.quit()
===========================
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-sig-hpc
mailing list