[R-sig-hpc] what causes this fork warning from openmpi
Dirk Eddelbuettel
edd at debian.org
Tue Sep 28 02:49:49 CEST 2010
Paul,
On 27 September 2010 at 17:43, Paul Johnson wrote:
| Hi, I wonder if you have seen this message from openmpi-1.4.1 and R-2.11.1.
|
| I have built a small sample program that causes this error/warning
| every time. In fact, I still get this error/warning if I only just
| ask Rmpi to spawn the slaves.
|
| I see the following in the "e" file that is created automatically.
| This used to be an intermittent thing that I thought was caused by
| trying to use multicore, but now I see it every time I run the
| example.
|
| --------------------------------------------------------------------------
| An MPI process has executed an operation involving a call to the
| "fork()" system call to create a child process. Open MPI is currently
| operating in a condition that could result in memory corruption or
| other system errors; your MPI job may hang, crash, or produce silent
| data corruption. The use of fork() (or system() or other calls that
| create child processes) is strongly discouraged.
|
| The process that invoked fork was:
|
| Local host: compute-2-19.local (PID 10000)
| MPI_COMM_WORLD rank: 0
|
| If you are *absolutely sure* that your application will successfully
| and correctly survive a call to fork(), you may disable this warning
| by setting the mpi_warn_on_fork MCA parameter to 0.
| --------------------------------------------------------------------------
|
| I have Centos Linux ("Rocks" cluster) with openmpi-1.4.1
|
| > sessionInfo()
| R version 2.11.1 (2010-05-31)
| x86_64-redhat-linux-gnu
|
| locale:
| [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
| [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
| [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
| [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
| [9] LC_ADDRESS=C LC_TELEPHONE=C
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
|
| attached base packages:
| [1] stats graphics grDevices utils datasets methods base
|
|
| Small working example that causes this problem:
|
| Submission script
|
| ======== sub-test.sh=============================
|
| $ cat sub-test.sh
| #!/bin/sh
| #
| #This is a submission script to batch out the full sim
| #
| #These commands set up the Grid Environment for your job:
| #PBS -N MpiParallel
| #PBS -l nodes=12:ppn=4
| #PBS -l walltime=480:00:00
| #PBS -M pauljohnku.edu
| #PBS -m bea
|
| cd $PBS_O_WORKDIR
|
| orterun --hostfile $PBS_NODEFILE -n 1 R --no-save --vanilla -f mi-test.R
|
| ====================================================
|
| =========mi-test.R==========================
|
| if (!is.loaded("mpi_intitialize")){
| library(Rmpi)
| }
|
| ## see http://math.acadiau.ca/ACMMaC/Rmpi/sample.html
|
| ### Try worker processes;
| mpi.spawn.Rslaves(nslaves=8)
|
| # In case R exits unexpectedly, have it automatically clean up
| # resources taken up by Rmpi (slaves, memory, etc...)
| .Last <- function(){
| if (is.loaded("mpi_initialize")){
| if (mpi.comm.size(1) > 0){
| print("Please use mpi.close.Rslaves() to close slaves.")
| mpi.close.Rslaves()
| }
| print("Please use mpi.quit() to quit R")
| .Call("mpi_finalize")
| }
| }
|
| ### here's where I used to have mpi commands :)
|
|
| mpi.close.Rslaves()
| mpi.quit()
|
| ====================================================
That looks odd: mpi.close.Rslaves() and mpi.quit() constitute the end; they
close MPI and the you then try to execute more MPI commands via .Last().
When I run your script, I end up with a seg.fault.
If I comment out your .Last() routine, everything is dapper. So if I were
you, I'd try to get by without the .Last() function.
FWIW I used R 2.11.1, Open MPI 1.4.1, launched via orterun and littler / r.
Hth, Dirk
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
More information about the R-sig-hpc
mailing list