[R-sig-hpc] [R] segmentation fault with Rmpi and OpenMPI on Ubuntu 9.04

Dirk Eddelbuettel edd at debian.org
Wed Jun 24 03:00:35 CEST 2009


Hi Mark,

On 23 June 2009 at 19:38, Mark Mueller wrote:
| PROBLEM DEFINITION --
| 
| Master:
| 
| - AMD_64
| - Ubuntu 9.04 64-bit
| - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
| to the localhost via ssh to run local jobs) - manually downloaded source and
| compiled
| - Rmpi 0.5-7 - package installed using R install.packages()
| - R 2.9.0 - installed using apt-get

Ok. [ I prefer to take Debian sources for Open MPI and rebuild local packages
on Ubuntu for things like Open MPI but otherwise it looks fine. ]

| Slave:
| 
| - Intel Pentium 4 32-bit
| - Ubuntu 9.04 32-bit
| - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
| to the localhost via ssh to run local jobs) - manually downloaded source and
| compiled
| - Rmpi 0.5-7 - package installed using R install.packages()
| - R 2.9.0 - installed using apt-get

Same -- but I am cautious about the 32bit / 64bit mix. I have no experience
there.  At work everything is 64bit, at home everything is 32bit.  

| When executing the following command from the master:
| 
| --> mpirun --hostfile <some file> -np 1 R CMD BATCH <some program>.R
| 
| the following trace results on the master node (lines 18 and 19 are from my
| particular R program):
| 
| *** caught segfault ***
| address 0x10333e4d8, cause 'memory not mapped'
| 
| Traceback:
|  1: .Call("mpi_recv", x, as.integer(type), as.integer(source),
| as.integer(tag),     as.integer(comm), as.integer(status), PACKAGE = "Rmpi")
|  2: mpi.recv(x = raw(charlen), type = 4, srctag[1], srctag[2], comm,
| status)
|  3: typeof(connection)
|  4: unserialize(obj)
|  5: .mpi.unserialize(mpi.recv(x = raw(charlen), type = 4, srctag[1],
| srctag[2], comm, status))
|  6: mpi.recv.Robj(node$rank, node$RECVTAG, node$comm)
|  7: recvData.MPInode(con)
|  8: recvData(con)
|  9: FUN(X[[6L]], ...)
| 10: lapply(cl[1:jobs], recvResult)
| 11: staticClusterApply(cl, fun, length(x), argfun)
| 12: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
| 13: is.vector(X)
| 14: lapply(args, enquote)
| 15: do.call("fun", lapply(args, enquote))
| 16: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply,     fun,
| ...))
| 17: snow::parLapply(snow::getMPIcluster(), object, FUN, ..., DMetaData =
| DMetaData(object))
| 18: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
| "realestate")
| 19: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
| "realestate")
| aborting ...
| Segmentation fault

In a case like this I always prefer to step back and run simple scripts (as
from my "Intro to HPC with R" tutorials).  E.g. can you run

a) a simple mpiHelloWorld C program with no other depends between master and
   slave nodes ?  This shows basic MPI functionality.

   mpiHelloWorld.c is attached. Do

   $ mpicc -o mpiHelloWorld mpiHelloWorld.c 
   $ # cp and scp to /tmp on master and slave
   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld

b) same for a simple Rmpi script doing the same ?  This shows R/MPI interaction.

   Likewise, place mpiHelloWorld.r in /tmp on each machine, then

   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld.r

c) do the same for snow (by writing a simple snow/MPI file)

d) if you care for slurm, do the same with slurm to allocate resource in
   which you then run orterun to launch R/MPI jobs.

| CONFIGURATION STEPS TAKEN --
| 
| - There is no common/shared file system mounted for the cluster.
| 
| - All PATH and LD_LIBRARY_PATH environment variables for OpenMPI are
| properly set on each node (including the master).
| 
| - OpenMPI was configured and built on each node with the
| --enable-heterogeneous configuration flag to account for the AMD-64 and
| Intel-32 architectures.
| 
| - The R_SNOW_LIB environment variable is set properly and the RunSnowNode
| and RunSnowWorker scrips are located in the PATH (and set to executable) on
| all nodes (including the master).
| 
| - All of the OpenMPI settings as documented in the OpenMPI FAQs to allow for
| remote execution (i.e. rsh/ssh, .rhosts) are in place.
| 
| Any insight or assistance will be greatly appreciated.

As outlined above, I try to stick with the 'tested' configuration from the
Debian packages so I don't have to deal with all the env vars etc.  Also,
decomposing down from snow to Rmpi to MPI by itself may help.

Best regards, Dirk


| 
| Sincerely,
| Mark
| 
| 	[[alternative HTML version deleted]]
| 
| _______________________________________________
| R-sig-hpc mailing list
| R-sig-hpc at r-project.org
| https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiHelloWorld.c
Type: application/octet-stream
Size: 507 bytes
Desc: mpiHelloWorld in C
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20090623/c35cff2c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpiHelloWorld.r
Type: application/octet-stream
Size: 230 bytes
Desc: mpiHelloWorld in R
URL: <https://stat.ethz.ch/pipermail/r-sig-hpc/attachments/20090623/c35cff2c/attachment-0001.obj>
-------------- next part --------------

-- 
Three out of two people have difficulties with fractions.


More information about the R-sig-hpc mailing list