[R-sig-hpc] [R] segmentation fault with Rmpi and OpenMPI on Ubuntu 9.04
Dirk Eddelbuettel
edd at debian.org
Wed Jun 24 03:00:35 CEST 2009
Hi Mark,
On 23 June 2009 at 19:38, Mark Mueller wrote:
| PROBLEM DEFINITION --
|
| Master:
|
| - AMD_64
| - Ubuntu 9.04 64-bit
| - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
| to the localhost via ssh to run local jobs) - manually downloaded source and
| compiled
| - Rmpi 0.5-7 - package installed using R install.packages()
| - R 2.9.0 - installed using apt-get
Ok. [ I prefer to take Debian sources for Open MPI and rebuild local packages
on Ubuntu for things like Open MPI but otherwise it looks fine. ]
| Slave:
|
| - Intel Pentium 4 32-bit
| - Ubuntu 9.04 32-bit
| - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
| to the localhost via ssh to run local jobs) - manually downloaded source and
| compiled
| - Rmpi 0.5-7 - package installed using R install.packages()
| - R 2.9.0 - installed using apt-get
Same -- but I am cautious about the 32bit / 64bit mix. I have no experience
there. At work everything is 64bit, at home everything is 32bit.
| When executing the following command from the master:
|
| --> mpirun --hostfile <some file> -np 1 R CMD BATCH <some program>.R
|
| the following trace results on the master node (lines 18 and 19 are from my
| particular R program):
|
| *** caught segfault ***
| address 0x10333e4d8, cause 'memory not mapped'
|
| Traceback:
| 1: .Call("mpi_recv", x, as.integer(type), as.integer(source),
| as.integer(tag), as.integer(comm), as.integer(status), PACKAGE = "Rmpi")
| 2: mpi.recv(x = raw(charlen), type = 4, srctag[1], srctag[2], comm,
| status)
| 3: typeof(connection)
| 4: unserialize(obj)
| 5: .mpi.unserialize(mpi.recv(x = raw(charlen), type = 4, srctag[1],
| srctag[2], comm, status))
| 6: mpi.recv.Robj(node$rank, node$RECVTAG, node$comm)
| 7: recvData.MPInode(con)
| 8: recvData(con)
| 9: FUN(X[[6L]], ...)
| 10: lapply(cl[1:jobs], recvResult)
| 11: staticClusterApply(cl, fun, length(x), argfun)
| 12: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
| 13: is.vector(X)
| 14: lapply(args, enquote)
| 15: do.call("fun", lapply(args, enquote))
| 16: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply, fun,
| ...))
| 17: snow::parLapply(snow::getMPIcluster(), object, FUN, ..., DMetaData =
| DMetaData(object))
| 18: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
| "realestate")
| 19: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
| "realestate")
| aborting ...
| Segmentation fault
In a case like this I always prefer to step back and run simple scripts (as
from my "Intro to HPC with R" tutorials). E.g. can you run
a) a simple mpiHelloWorld C program with no other depends between master and
slave nodes ? This shows basic MPI functionality.
mpiHelloWorld.c is attached. Do
$ mpicc -o mpiHelloWorld mpiHelloWorld.c
$ # cp and scp to /tmp on master and slave
$ orterun -n 4 -H master,slave /tmp/mpiHelloWorld
b) same for a simple Rmpi script doing the same ? This shows R/MPI interaction.
Likewise, place mpiHelloWorld.r in /tmp on each machine, then
$ orterun -n 4 -H master,slave /tmp/mpiHelloWorld.r
c) do the same for snow (by writing a simple snow/MPI file)
d) if you care for slurm, do the same with slurm to allocate resource in
which you then run orterun to launch R/MPI jobs.
| CONFIGURATION STEPS TAKEN --
|
| - There is no common/shared file system mounted for the cluster.
|
| - All PATH and LD_LIBRARY_PATH environment variables for OpenMPI are
| properly set on each node (including the master).
|
| - OpenMPI was configured and built on each node with the
| --enable-heterogeneous configuration flag to account for the AMD-64 and
| Intel-32 architectures.
|
| - The R_SNOW_LIB environment variable is set properly and the RunSnowNode
| and RunSnowWorker scrips are located in the PATH (and set to executable) on
| all nodes (including the master).
|
| - All of the OpenMPI settings as documented in the OpenMPI FAQs to allow for
| remote execution (i.e. rsh/ssh, .rhosts) are in place.
|
| Any insight or assistance will be greatly appreciated.
As outlined above, I try to stick with the 'tested' configuration from the
Debian packages so I don't have to deal with all the env vars etc. Also,
decomposing down from snow to Rmpi to MPI by itself may help.
Best regards, Dirk
|
| Sincerely,
| Mark
|
| [[alternative HTML version deleted]]
|
| _______________________________________________
| R-sig-hpc mailing list
| R-sig-hpc at r-project.org
| https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiHelloWorld.c
Type: application/octet-stream
Size: 507 bytes
Desc: mpiHelloWorld in C
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20090623/c35cff2c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiHelloWorld.r
Type: application/octet-stream
Size: 230 bytes
Desc: mpiHelloWorld in R
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20090623/c35cff2c/attachment-0001.obj>
-------------- next part --------------
--
Three out of two people have difficulties with fractions.
More information about the R-sig-hpc
mailing list