[R-sig-hpc] [R] segmentation fault with Rmpi and OpenMPI on Ubuntu 9.04

Mark Mueller mark.mueller at gmail.com
Wed Jun 24 19:31:11 CEST 2009


Hello Dirk,

Many thanks for your reply and initial look at the situation.

I ran though the examples you listed below, and everything worked just
fine (as I suspected it would).  Unfortunately, when I run my R
program I still get the segmentation fault.  The R program itself uses
the Text Miner package, which in turn relies on the Snow package (and
hence Rmpi), so there isn't likely anything specifically in my R
program that is causing the issue (rather, it might be something in
one of the packages?).  As I understand, Text Miner simply calls
routines in the Snow package which then invokes Rmpi calls.

Sample lines from my R program --
=======

library (tm)

activateCluster() <-- uses Snow
... [some R + text miner code (with text miner further invoking the Snow API)]
deactivateCluster() <-- uses Snow

=======

We do not use SLURM (or any other resource allocation solutions) at
this point in an effort to keep things simple (at first) as we build
out our parallel environment.

I just wonder if there is something with the 64-bit master and 32-bit
slave that is the cause, although I'm not sure how to uncover that.

On Tue, Jun 23, 2009 at 8:00 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
>
> Hi Mark,
>
> On 23 June 2009 at 19:38, Mark Mueller wrote:
> | PROBLEM DEFINITION --
> |
> | Master:
> |
> | - AMD_64
> | - Ubuntu 9.04 64-bit
> | - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
> | to the localhost via ssh to run local jobs) - manually downloaded source and
> | compiled
> | - Rmpi 0.5-7 - package installed using R install.packages()
> | - R 2.9.0 - installed using apt-get
>
> Ok. [ I prefer to take Debian sources for Open MPI and rebuild local packages
> on Ubuntu for things like Open MPI but otherwise it looks fine. ]
>
> | Slave:
> |
> | - Intel Pentium 4 32-bit
> | - Ubuntu 9.04 32-bit
> | - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
> | to the localhost via ssh to run local jobs) - manually downloaded source and
> | compiled
> | - Rmpi 0.5-7 - package installed using R install.packages()
> | - R 2.9.0 - installed using apt-get
>
> Same -- but I am cautious about the 32bit / 64bit mix. I have no experience
> there.  At work everything is 64bit, at home everything is 32bit.
>
> | When executing the following command from the master:
> |
> | --> mpirun --hostfile <some file> -np 1 R CMD BATCH <some program>.R
> |
> | the following trace results on the master node (lines 18 and 19 are from my
> | particular R program):
> |
> | *** caught segfault ***
> | address 0x10333e4d8, cause 'memory not mapped'
> |
> | Traceback:
> |  1: .Call("mpi_recv", x, as.integer(type), as.integer(source),
> | as.integer(tag),     as.integer(comm), as.integer(status), PACKAGE = "Rmpi")
> |  2: mpi.recv(x = raw(charlen), type = 4, srctag[1], srctag[2], comm,
> | status)
> |  3: typeof(connection)
> |  4: unserialize(obj)
> |  5: .mpi.unserialize(mpi.recv(x = raw(charlen), type = 4, srctag[1],
> | srctag[2], comm, status))
> |  6: mpi.recv.Robj(node$rank, node$RECVTAG, node$comm)
> |  7: recvData.MPInode(con)
> |  8: recvData(con)
> |  9: FUN(X[[6L]], ...)
> | 10: lapply(cl[1:jobs], recvResult)
> | 11: staticClusterApply(cl, fun, length(x), argfun)
> | 12: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
> | 13: is.vector(X)
> | 14: lapply(args, enquote)
> | 15: do.call("fun", lapply(args, enquote))
> | 16: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply,     fun,
> | ...))
> | 17: snow::parLapply(snow::getMPIcluster(), object, FUN, ..., DMetaData =
> | DMetaData(object))
> | 18: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
> | "realestate")
> | 19: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
> | "realestate")
> | aborting ...
> | Segmentation fault
>
> In a case like this I always prefer to step back and run simple scripts (as
> from my "Intro to HPC with R" tutorials).  E.g. can you run
>
> a) a simple mpiHelloWorld C program with no other depends between master and
>   slave nodes ?  This shows basic MPI functionality.
>
>   mpiHelloWorld.c is attached. Do
>
>   $ mpicc -o mpiHelloWorld mpiHelloWorld.c
>   $ # cp and scp to /tmp on master and slave
>   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld
>
> b) same for a simple Rmpi script doing the same ?  This shows R/MPI interaction.
>
>   Likewise, place mpiHelloWorld.r in /tmp on each machine, then
>
>   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld.r
>
> c) do the same for snow (by writing a simple snow/MPI file)
>
> d) if you care for slurm, do the same with slurm to allocate resource in
>   which you then run orterun to launch R/MPI jobs.
>
> | CONFIGURATION STEPS TAKEN --
> |
> | - There is no common/shared file system mounted for the cluster.
> |
> | - All PATH and LD_LIBRARY_PATH environment variables for OpenMPI are
> | properly set on each node (including the master).
> |
> | - OpenMPI was configured and built on each node with the
> | --enable-heterogeneous configuration flag to account for the AMD-64 and
> | Intel-32 architectures.
> |
> | - The R_SNOW_LIB environment variable is set properly and the RunSnowNode
> | and RunSnowWorker scrips are located in the PATH (and set to executable) on
> | all nodes (including the master).
> |
> | - All of the OpenMPI settings as documented in the OpenMPI FAQs to allow for
> | remote execution (i.e. rsh/ssh, .rhosts) are in place.
> |
> | Any insight or assistance will be greatly appreciated.
>
> As outlined above, I try to stick with the 'tested' configuration from the
> Debian packages so I don't have to deal with all the env vars etc.  Also,
> decomposing down from snow to Rmpi to MPI by itself may help.
>
> Best regards, Dirk
>
>
> |
> | Sincerely,
> | Mark
> |
> |       [[alternative HTML version deleted]]
> |
> | _______________________________________________
> | R-sig-hpc mailing list
> | R-sig-hpc at r-project.org
> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
>
>
> --
> Three out of two people have difficulties with fractions.
>



More information about the R-sig-hpc mailing list