[R-sig-hpc] what causes this fork warning from openmpi

Paul Johnson pauljohn32 at gmail.com
Tue Sep 28 05:43:28 CEST 2010


On Mon, Sep 27, 2010 at 7:49 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> Paul,
>
> | --------------------------------------------------------------------------
> | An MPI process has executed an operation involving a call to the
> | "fork()" system call to create a child process.  Open MPI is currently
> | operating in a condition that could result in memory corruption or
> | other system errors; your MPI job may hang, crash, or produce silent
> | data corruption.  The use of fork() (or system() or other calls that
> | create child processes) is strongly discouraged.
> |
> | The process that invoked fork was:
> |
> |   Local host:          compute-2-19.local (PID 10000)
> |   MPI_COMM_WORLD rank: 0
> |
> | If you are *absolutely sure* that your application will successfully
> | and correctly survive a call to fork(), you may disable this warning
> | by setting the mpi_warn_on_fork MCA parameter to 0.
> | ====================================================
>
> That looks odd: mpi.close.Rslaves() and mpi.quit() constitute the end; they
> close MPI and the you then try to execute more MPI commands via .Last().
>
> When I run your script, I end up with a seg.fault.
>
> If I comment out your .Last() routine, everything is dapper.  So if I were
> you, I'd try to get by without the .Last() function.
>
> FWIW I used R 2.11.1, Open MPI 1.4.1, launched via orterun and littler / r.
>
> Hth, Dirk
>


Hi, Dirk:

I can't tell you how reassuring it is to send out a distress call and
find an answer so quickly.  I don't think you are correct, but I've
been wrong so often this past week that I'm second guessing myself
again.

The .Last idiom you think is the source of the trouble in the examples
on the acadia site--you know this one:

http://math.acadiau.ca/ACMMaC/Rmpi/sample.html

If the idiom were wrong, it seems somebody besides  you or me would
have noticed already--that is one of the few well documented working
examples of RMPI that many people have run.

I'm almost more puzzled that you get a seg fault instead of just the
scary warning.

I've been googling more and more on this and I'm starting to suspect
I've run into a somewhat deceptive warning message from OpenMPI on
heterogeneous networks in which there are some Infiniband devices.

http://comments.gmane.org/gmane.comp.clustering.open-mpi.user/10148

I did not connect the two things before.  In the cluster where I'm
working, the nodes named compute-0-XX and compute-1-XX are gigabit
ethernet connected, but compute-2-XX is infiniband. I can't say I've
ever had a job spill over to the infiniband rack before.

I think one test on my end will be to restrict my jobs from going onto
the infiniband nodes to see if the same warning appears.

pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas



More information about the R-sig-hpc mailing list