[R-sig-hpc] what causes this fork warning from openmpi

Dirk Eddelbuettel edd at debian.org
Tue Sep 28 02:49:49 CEST 2010


Paul,


On 27 September 2010 at 17:43, Paul Johnson wrote:
| Hi, I wonder if you have seen this message from openmpi-1.4.1 and R-2.11.1.
| 
| I have built a small sample program that causes this error/warning
| every time.  In fact, I still get this error/warning if I only just
| ask Rmpi to spawn the slaves.
| 
| I see the following in the "e" file that is created automatically.
| This used to be an intermittent thing that I thought was caused by
| trying to use multicore, but now I see it every time I run the
| example.
| 
| --------------------------------------------------------------------------
| An MPI process has executed an operation involving a call to the
| "fork()" system call to create a child process.  Open MPI is currently
| operating in a condition that could result in memory corruption or
| other system errors; your MPI job may hang, crash, or produce silent
| data corruption.  The use of fork() (or system() or other calls that
| create child processes) is strongly discouraged.
| 
| The process that invoked fork was:
| 
|   Local host:          compute-2-19.local (PID 10000)
|   MPI_COMM_WORLD rank: 0
| 
| If you are *absolutely sure* that your application will successfully
| and correctly survive a call to fork(), you may disable this warning
| by setting the mpi_warn_on_fork MCA parameter to 0.
| --------------------------------------------------------------------------
| 
| I have Centos Linux ("Rocks" cluster) with openmpi-1.4.1
| 
| > sessionInfo()
| R version 2.11.1 (2010-05-31)
| x86_64-redhat-linux-gnu
| 
| locale:
|  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
|  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
|  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
|  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
|  [9] LC_ADDRESS=C               LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
| 
| attached base packages:
| [1] stats     graphics  grDevices utils     datasets  methods   base
| 
| 
| Small working example that causes this problem:
| 
| Submission script
| 
| ======== sub-test.sh=============================
| 
| $ cat sub-test.sh
| #!/bin/sh
| #
| #This is a submission script to batch out the full sim
| #
| #These commands set up the Grid Environment for your job:
| #PBS -N MpiParallel
| #PBS -l nodes=12:ppn=4
| #PBS -l walltime=480:00:00
| #PBS -M pauljohnku.edu
| #PBS -m bea
| 
| cd $PBS_O_WORKDIR
| 
| orterun --hostfile $PBS_NODEFILE -n 1 R --no-save --vanilla -f mi-test.R
| 
| ====================================================
| 
| =========mi-test.R==========================
| 
| if (!is.loaded("mpi_intitialize")){
|   library(Rmpi)
| }
| 
| ## see http://math.acadiau.ca/ACMMaC/Rmpi/sample.html
| 
| ### Try worker processes;
| mpi.spawn.Rslaves(nslaves=8)
| 
| # In case R exits unexpectedly, have it automatically clean up
| # resources taken up by Rmpi (slaves, memory, etc...)
| .Last <- function(){
|   if (is.loaded("mpi_initialize")){
|     if (mpi.comm.size(1) > 0){
|       print("Please use mpi.close.Rslaves() to close slaves.")
|       mpi.close.Rslaves()
|     }
|     print("Please use mpi.quit() to quit R")
|     .Call("mpi_finalize")
|   }
| }
| 
| ### here's where I used to have mpi commands  :)
| 
| 
| mpi.close.Rslaves()
| mpi.quit()
| 
| ====================================================

That looks odd: mpi.close.Rslaves() and mpi.quit() constitute the end; they
close MPI and the you then try to execute more MPI commands via .Last().

When I run your script, I end up with a seg.fault. 

If I comment out your .Last() routine, everything is dapper.  So if I were
you, I'd try to get by without the .Last() function.

FWIW I used R 2.11.1, Open MPI 1.4.1, launched via orterun and littler / r.

Hth, Dirk


-- 
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com



More information about the R-sig-hpc mailing list