[R-sig-hpc] Rmpi working with OpenMPI and PBSPro but snow fails

Huw Lynes lynesh at cardiff.ac.uk
Mon Mar 9 10:46:23 CET 2009


On Thu, 2009-03-05 at 07:45 -0600, luke at stat.uiowa.edu wrote:
> On Thu, 5 Mar 2009, Huw Lynes wrote:
> 
> >
> > OK results of some more testing. RMPISNOW is looking for
> > OMPI_MCA_ns_nds_vpid which OpenMPI 1.3 doesn't seem to export. It does
> > export OMPI_MCA_orte_ess_vpid. So I've altered RMPISNOW to look for
> > that.
> 
> Thanks -- I will update RMPISNOW to check for both.
> 
> >
> > This results in one R process being launched as --no-save and the others
> > being launched as --slave.
> >
> > However the slave processes all exit leaving the master hanging there
> > with nothing to do.
> >
> > To debug I replaced --slave with --verbose --no-save
> > The last thing the worker processes print is
> >
> > >R_ReplConsole(): before "for(;;)" {main.c}
> 
> Do you have the full output from the worker processes (stdout and stderr)?
> 

Here is the full stderr as sent back by PBS.

-----------------------------------------------------------------------------
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/methods/libs/methods.so") ...
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/methods/libs/methods.so") ...
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/methods/libs/methods.so") ...
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/grDevices/libs/grDevices.so") ...
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/grDevices/libs/grDevices.so") ...
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/grDevices/libs/grDevices.so") ...
Garbage collection 1 = 0+0+1 (level 2) ... 
5.2 Mbytes of cons cells used (27%)
0.9 Mbytes of vectors used (14%)
Garbage collection 1 = 0+0+1 (level 2) ... 
5.2 Mbytes of cons cells used (27%)
0.9 Mbytes of vectors used (14%)
Garbage collection 1 = 0+0+1 (level 2) ... 
5.2 Mbytes of cons cells used (27%)
0.9 Mbytes of vectors used (14%)
Garbage collection 2 = 1+0+1 (level 0) ... 
5.4 Mbytes of cons cells used (29%)
0.9 Mbytes of vectors used (15%)
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/stats/libs/stats.so") ...
Garbage collection 2 = 1+0+1 (level 0) ... 
5.4 Mbytes of cons cells used (29%)
0.9 Mbytes of vectors used (15%)
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/stats/libs/stats.so") ...
Garbage collection 2 = 1+0+1 (level 0) ... 
5.4 Mbytes of cons cells used (29%)
0.9 Mbytes of vectors used (15%)
now
dyn.load("/software/applications/R/2.8.1/gnu-4.2.4/lib64/R/library/stats/libs/stats.so") ...
 >R_ReplConsole(): before "for(;;)" {main.c}
 >R_ReplConsole(): before "for(;;)" {main.c}
 >R_ReplConsole(): before "for(;;)" {main.c}
mpirun: killing job...

--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 9820 on node arccacluster268
exited on signal 0 (Unknown signal 0).
--------------------------------------------------------------------------
mpirun: clean termination accomplished
-------------------------------------------------------------------------------

and the full stdout

-------------------------------------------------------------------------------
slave 3
slave 2
slave 1
WARNING: ignoring environment value of R_HOME

R version 2.8.1 (2008-12-22)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library("Rmpi")
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME
WARNING: ignoring environment value of R_HOME

R version 2.8.1 (2008-12-22)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


R version 2.8.1 (2008-12-22)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


R version 2.8.1 (2008-12-22)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 
> 
> 
-------------------------------------------------------------------------------

The interesting thing is that now that I have the SOCK version working
properly I can see what the output should be:

[[4]]
         nodename           machine 
"arccacluster268"          "x86_64" 

Now if I look at the strace from the slave processes in the MPI version
I can see that they are writing this information but it doesn't seem to
be getting back to the master. What seems to be happening is that the
cluster gets set up. The slaves get sent their commands and run them
properly, but that information never gets back to the master. So the
slaves exit cleanly and the master is left sitting their waiting for
information that it will never receive.

Thanks,
Huw




-- 
Huw Lynes                       | Advanced Research Computing
HEC Sysadmin                    | Cardiff University
                                | Redwood Building, 
Tel: +44 (0) 29208 70626        | King Edward VII Avenue, CF10 3NB



More information about the R-sig-hpc mailing list