[R-sig-hpc] snow uses less nodes than the available?

epaiv01 martin.ivanov at ifg.uni-tuebingen.de
Sun Apr 21 14:25:24 CEST 2013


Dear all,

I want to run a task in parallel on 3 nodes with 2 cores per node. So I 
request 4 nodes in my PBS script,
1 more for the master process. 3 processes get spawned successfully, but 
the problem is that only two of the available nodes are
recycled. The third available node for the third slave process is 
ignored. Here is my setup:
testNodes.R:

library(snow);
library(Rmpi);
cl <- makeCluster(spec=3L, type="MPI", outfile=""); # 3 slave nodes are 
created
x <- seq_len(20L);
y <- clusterApply(cl=cl, x=x, fun=function(x) 
list(sysInfo=Sys.info()[c("nodename","machine")], x=x));
save(x,y, file="/home-link/epaiv01/test.RData");
stopCluster(cl=cl);
mpi.quit()

and testNodes.pbs:

#!/bin/bash
#PBS -l nodes=4:ppn=2
#PBS -l walltime=00:01:00
#PBS -l pmem=100kb

. /$HOME/.bashrc

cd $PBS_O_WORKDIR

echo "My machine will have the following nodes:"
echo "-----------------------------------------"
cat ${PBS_NODEFILE}
echo "-----------------------------------------"

mpirun -np 1 -hostfile $PBS_NODEFILE /home-link/epaiv01/system/usr/bin/R 
--no-save < testNodes.R

Here is an excerpt of the output (the *pbs.o* file):

My machine will have the following nodes:
-----------------------------------------
n030203
n030203
n020207
n020207
n020209
n020209
n020206
n020206
-----------------------------------------

 > cl <- makeCluster(spec=3L, type="MPI", outfile=""); # 3 slave nodes 
are created
         3 slaves are spawned successfully. 0 failed.
starting MPI worker
starting MPI worker
starting MPI worker
 > x <- seq_len(20L);
 > y <- clusterApply(cl=cl, x=x, fun=function(x) 
list(sysInfo=Sys.info()[c("nodename","machine")], x=x));
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
 > save(x,y, file="/home-link/epaiv01/test.RData");
 > stopCluster(cl=cl);
Type: DONE
Type: DONE
Type: DONE
[1] 1
 > mpi.quit()


But when I load the test.RData file, the object y, where I saved the 
node info, is:
[[1]]
[[1]]$sysInfo
  nodename   machine
"n030203"  "x86_64"

[[1]]$x
[1] 1


[[2]]
[[2]]$sysInfo
  nodename   machine
"n020207"  "x86_64"

[[2]]$x
[1] 2


[[3]]
[[3]]$sysInfo
  nodename   machine
"n020207"  "x86_64"

[[3]]$x
[1] 3


and this is recycled until twenty. So only the nodes "n030203" and 
"n020207". Nothing is mentioned about n020209 and
n020206, which are also available. Of course one of them is reserved for 
the master process, but what about the other?

I have no idea what is wrong. Any suggestions will be appreciated.

Best regards,

Martin


-- 
Dr. Martin Ivanov
Eberhard-Karls-Universität Tübingen
Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich Geowissenschaften
Water & Earth System Science (WESS)
Hölderlinstraße 12, 72074 Tübingen, Deutschland
Tel. +4970712974213



More information about the R-sig-hpc mailing list