[R-sig-hpc] snow uses less nodes than the available?
epaiv01
martin.ivanov at ifg.uni-tuebingen.de
Sun Apr 21 14:25:24 CEST 2013
Dear all,
I want to run a task in parallel on 3 nodes with 2 cores per node. So I
request 4 nodes in my PBS script,
1 more for the master process. 3 processes get spawned successfully, but
the problem is that only two of the available nodes are
recycled. The third available node for the third slave process is
ignored. Here is my setup:
testNodes.R:
library(snow);
library(Rmpi);
cl <- makeCluster(spec=3L, type="MPI", outfile=""); # 3 slave nodes are
created
x <- seq_len(20L);
y <- clusterApply(cl=cl, x=x, fun=function(x)
list(sysInfo=Sys.info()[c("nodename","machine")], x=x));
save(x,y, file="/home-link/epaiv01/test.RData");
stopCluster(cl=cl);
mpi.quit()
and testNodes.pbs:
#!/bin/bash
#PBS -l nodes=4:ppn=2
#PBS -l walltime=00:01:00
#PBS -l pmem=100kb
. /$HOME/.bashrc
cd $PBS_O_WORKDIR
echo "My machine will have the following nodes:"
echo "-----------------------------------------"
cat ${PBS_NODEFILE}
echo "-----------------------------------------"
mpirun -np 1 -hostfile $PBS_NODEFILE /home-link/epaiv01/system/usr/bin/R
--no-save < testNodes.R
Here is an excerpt of the output (the *pbs.o* file):
My machine will have the following nodes:
-----------------------------------------
n030203
n030203
n020207
n020207
n020209
n020209
n020206
n020206
-----------------------------------------
> cl <- makeCluster(spec=3L, type="MPI", outfile=""); # 3 slave nodes
are created
3 slaves are spawned successfully. 0 failed.
starting MPI worker
starting MPI worker
starting MPI worker
> x <- seq_len(20L);
> y <- clusterApply(cl=cl, x=x, fun=function(x)
list(sysInfo=Sys.info()[c("nodename","machine")], x=x));
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
Type: EXEC
> save(x,y, file="/home-link/epaiv01/test.RData");
> stopCluster(cl=cl);
Type: DONE
Type: DONE
Type: DONE
[1] 1
> mpi.quit()
But when I load the test.RData file, the object y, where I saved the
node info, is:
[[1]]
[[1]]$sysInfo
nodename machine
"n030203" "x86_64"
[[1]]$x
[1] 1
[[2]]
[[2]]$sysInfo
nodename machine
"n020207" "x86_64"
[[2]]$x
[1] 2
[[3]]
[[3]]$sysInfo
nodename machine
"n020207" "x86_64"
[[3]]$x
[1] 3
and this is recycled until twenty. So only the nodes "n030203" and
"n020207". Nothing is mentioned about n020209 and
n020206, which are also available. Of course one of them is reserved for
the master process, but what about the other?
I have no idea what is wrong. Any suggestions will be appreciated.
Best regards,
Martin
--
Dr. Martin Ivanov
Eberhard-Karls-Universität Tübingen
Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich Geowissenschaften
Water & Earth System Science (WESS)
Hölderlinstraße 12, 72074 Tübingen, Deutschland
Tel. +4970712974213
More information about the R-sig-hpc
mailing list