[R-sig-hpc] snow uses less nodes than the available?

epaiv01 martin.ivanov at ifg.uni-tuebingen.de
Sun Apr 21 17:36:21 CEST 2013


Dear all,
I solved the problem myself. The thing is that I had not completely 
understood the nodes and processes concepts in snow.
Actually in snow a node is a process. So the problem is solved once I 
specify 7 instead 3 as the spec parameter to makeCluster.
I have reserved 8 processes from the cluster, so 1 is for the master R 
process started by openmpi and 7 are spawned as slaves by snow.

Thank You all for the attention and especially to Prof. Tierney for 
providing the snow package to R.

Best regards

Martin

On 04/21/2013 02:25 PM, epaiv01 wrote:
> Dear all,
>
> I want to run a task in parallel on 3 nodes with 2 cores per node. So 
> I request 4 nodes in my PBS script,
> 1 more for the master process. 3 processes get spawned successfully, 
> but the problem is that only two of the available nodes are
> recycled. The third available node for the third slave process is 
> ignored. Here is my setup:
> testNodes.R:
>
> library(snow);
> library(Rmpi);
> cl <- makeCluster(spec=3L, type="MPI", outfile=""); # 3 slave nodes 
> are created
> x <- seq_len(20L);
> y <- clusterApply(cl=cl, x=x, fun=function(x) 
> list(sysInfo=Sys.info()[c("nodename","machine")], x=x));
> save(x,y, file="/home-link/epaiv01/test.RData");
> stopCluster(cl=cl);
> mpi.quit()
>
> and testNodes.pbs:
>
> #!/bin/bash
> #PBS -l nodes=4:ppn=2
> #PBS -l walltime=00:01:00
> #PBS -l pmem=100kb
>
> . /$HOME/.bashrc
>
> cd $PBS_O_WORKDIR
>
> echo "My machine will have the following nodes:"
> echo "-----------------------------------------"
> cat ${PBS_NODEFILE}
> echo "-----------------------------------------"
>
> mpirun -np 1 -hostfile $PBS_NODEFILE 
> /home-link/epaiv01/system/usr/bin/R --no-save < testNodes.R
>
> Here is an excerpt of the output (the *pbs.o* file):
>
> My machine will have the following nodes:
> -----------------------------------------
> n030203
> n030203
> n020207
> n020207
> n020209
> n020209
> n020206
> n020206
> -----------------------------------------
>
> > cl <- makeCluster(spec=3L, type="MPI", outfile=""); # 3 slave nodes 
> are created
>         3 slaves are spawned successfully. 0 failed.
> starting MPI worker
> starting MPI worker
> starting MPI worker
> > x <- seq_len(20L);
> > y <- clusterApply(cl=cl, x=x, fun=function(x) 
> list(sysInfo=Sys.info()[c("nodename","machine")], x=x));
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> Type: EXEC
> > save(x,y, file="/home-link/epaiv01/test.RData");
> > stopCluster(cl=cl);
> Type: DONE
> Type: DONE
> Type: DONE
> [1] 1
> > mpi.quit()
>
>
> But when I load the test.RData file, the object y, where I saved the 
> node info, is:
> [[1]]
> [[1]]$sysInfo
>  nodename   machine
> "n030203"  "x86_64"
>
> [[1]]$x
> [1] 1
>
>
> [[2]]
> [[2]]$sysInfo
>  nodename   machine
> "n020207"  "x86_64"
>
> [[2]]$x
> [1] 2
>
>
> [[3]]
> [[3]]$sysInfo
>  nodename   machine
> "n020207"  "x86_64"
>
> [[3]]$x
> [1] 3
>
>
> and this is recycled until twenty. So only the nodes "n030203" and 
> "n020207". Nothing is mentioned about n020209 and
> n020206, which are also available. Of course one of them is reserved 
> for the master process, but what about the other?
>
> I have no idea what is wrong. Any suggestions will be appreciated.
>
> Best regards,
>
> Martin
>
>


-- 
Dr. Martin Ivanov
Eberhard-Karls-Universität Tübingen
Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich Geowissenschaften
Water & Earth System Science (WESS)
Hölderlinstraße 12, 72074 Tübingen, Deutschland
Tel. +4970712974213



More information about the R-sig-hpc mailing list