[R-sig-hpc] Rmpi with PBSPro and OpenMPI

Lyman, Mark Mark.Lyman at atk.com
Mon Mar 9 21:47:06 CET 2009


I just recently discovered this list and thought I would ask a question
about a mildly annoying issue. Generally, our setup works great,
however, I had to modify the .Last function in the .Rprofile file that
comes with Rmpi. The function now looks like this:
        .Last <- function ()
        {
                if (is.loaded("mpi_initialize")) {
                        if (mpi.comm.size(1) > 1) {
                                mpi.bcast.cmd(q("no"))
                        }
                }
        }

Without this modification, the R code is run successfully, but when
mpi.quit/mpi.exit/mpi.finalize are run everything stops. It seems that
the slaves are not being shut down appropriately, and the master never
gets the signal it is waiting for that the slaves have shut down. Has
anyone else had this issue and solved it? Or does anyone know what could
be the cause?

I'm not sure, but I'm afraid that this is related to the following error
that I occasionally get from OpenMPI:

[n087:30298] [0,0,0] mca_oob_tcp_recv_handler: invalid message type: 0
[n039:29963] [0,1,65]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
[n087:30298] [0,0,0] mca_oob_tcp_recv_handler: invalid message type: 0
[n039:29962] [0,1,64]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
[n087:30298] [0,0,0] mca_oob_tcp_recv_handler: invalid message type: 0
[n039:29964] [0,1,66]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104
[n087:30298] [0,0,0] mca_oob_tcp_recv_handler: invalid message type: 0
[n039:29965] [0,1,67]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv()
failed with errno=104

Usually, I am able to kill and retry the job and everything works fine,
but sometimes it can fail repeatedly. Please let me know if any more
information is needed. As you can see, I am a statistician, and I am
very new to HPC.

Mark Lyman, Statistician
Engineering Systems & Integration, ATK
(435) 863-2863


To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.

Sir Ronald Aylmer Fisher



More information about the R-sig-hpc mailing list