[R-sig-hpc] difficulty spawning Rslaves
edd at debian.org
Mon Dec 28 17:14:08 CET 2009
On 23 December 2009 at 16:05, Allan Strand wrote:
| My setup is on a cluster running 64bit FC. I have recently broken my
| install Rmpi (and hence snow) by upgrading some very old versions of R,
| lam/mpi, Rmpi, and snow (currently installed versions listed at the
| bottom of this email). No doubt this is a problem with my Rmpi install,
| but I'm having trouble seeing it.
| I cannot seem to spawn more than a single slave (which is spawned on the
| master node)
| > mpi.spawn.Rslaves(comm=1,nslaves=1)
| 1 slaves are spawned successfully. 0 failed.
| master (rank 0, comm 1) of size 2 is running on: node0
| slave1 (rank 1, comm 1) of size 2 is running on: node0
| > mpi.comm.free(comm=1)
|  1
| > mpi.spawn.Rslaves(comm=1,nslaves=2)
| 2 slaves are spawned successfully. 0 failed.
| Error in mpi.intercomm.merge(intercomm, 0, comm) :
| MPI_Error_string: process in local group is dead
| No doubt the answer is contained in the MPI_Error string, but I'm not
| sure how to interpret it.
| Versions (all installed locally in my account with directory appropriate
| ./configure settings)
| R 2.10.1
| LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
For what it is worth, a looong time ago (two years? longer?) when I was
helping Manual to get the Debian OpenMPI packages into and when I was
transitioning off LAM, I had concluded that the very latest 7.1.X releases of
LAM were broken for me. The system was a then-current Ubuntu system with the
LAM and OpenMPI packages compiled from Debian sources. Provided I 'frozen'
LAM at 7.1.2 things would work, the newer ones would not.
So I'd recommend either downgrading to the last LAM that worked for you, or
rather take the plunge and jump to Open MPI. The 1.3.* series is pretty
already, and 1.4.0 is just around the corner.
Just my $0.02. The problem may of course be entirely different.
Three out of two people have difficulties with fractions.
More information about the R-sig-hpc