[R-sig-hpc] [R] segmentation fault with Rmpi and OpenMPI on Ubuntu 9.04

Mark Mueller mark.mueller at gmail.com
Thu Jun 25 01:30:16 CEST 2009


I should have mentioned in the original post that the R script runs
flawlessly when executed in an isolated manner on the master or the
slave using MPI.  This is true even if you specify a value for 'slots'
and/or 'max-slots' in the hostfile.  Given this fact, it seems strange
that this issue would be with anything of interest to author of Text
Miner.

On Wed, Jun 24, 2009 at 1:13 PM, Dirk Eddelbuettel<edd at debian.org> wrote:
>
> Hi Mark,
>
> On 24 June 2009 at 12:31, Mark Mueller wrote:
> | Many thanks for your reply and initial look at the situation.
>
> My pleasure.
>
> | I ran though the examples you listed below, and everything worked just
> | fine (as I suspected it would).  Unfortunately, when I run my R
> | program I still get the segmentation fault.  The R program itself uses
> | the Text Miner package, which in turn relies on the Snow package (and
> | hence Rmpi), so there isn't likely anything specifically in my R
> | program that is causing the issue (rather, it might be something in
> | one of the packages?).  As I understand, Text Miner simply calls
> | routines in the Snow package which then invokes Rmpi calls.
>
> Aieee. Then it's between you and the author of TextMiner :) unless you find
> something below.
>
> | Sample lines from my R program --
> | =======
> |
> | library (tm)
> |
> | activateCluster() <-- uses Snow
> | ... [some R + text miner code (with text miner further invoking the Snow API)]
> | deactivateCluster() <-- uses Snow
> |
> | =======
> |
> | We do not use SLURM (or any other resource allocation solutions) at
> | this point in an effort to keep things simple (at first) as we build
> | out our parallel environment.
> |
> | I just wonder if there is something with the 64-bit master and 32-bit
> | slave that is the cause, although I'm not sure how to uncover that.
>
> Redefine your test setup to master/master only, and slave/slave.  You can
> perfectly fine do Rmpi, snow, ... on a single machine.  Ie do something like
>
>          orterun --host master -n 4 ./path/to/script
>
> or edit the hostfile you used previously. Same for the slave(s).
>
> Hth, Dirk
>
>
> |
> | On Tue, Jun 23, 2009 at 8:00 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
> | >
> | > Hi Mark,
> | >
> | > On 23 June 2009 at 19:38, Mark Mueller wrote:
> | > | PROBLEM DEFINITION --
> | > |
> | > | Master:
> | > |
> | > | - AMD_64
> | > | - Ubuntu 9.04 64-bit
> | > | - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
> | > | to the localhost via ssh to run local jobs) - manually downloaded source and
> | > | compiled
> | > | - Rmpi 0.5-7 - package installed using R install.packages()
> | > | - R 2.9.0 - installed using apt-get
> | >
> | > Ok. [ I prefer to take Debian sources for Open MPI and rebuild local packages
> | > on Ubuntu for things like Open MPI but otherwise it looks fine. ]
> | >
> | > | Slave:
> | > |
> | > | - Intel Pentium 4 32-bit
> | > | - Ubuntu 9.04 32-bit
> | > | - OpenMPI 1.3.2 (to avoid the problem in v1.3 where OpenMPI tries to connect
> | > | to the localhost via ssh to run local jobs) - manually downloaded source and
> | > | compiled
> | > | - Rmpi 0.5-7 - package installed using R install.packages()
> | > | - R 2.9.0 - installed using apt-get
> | >
> | > Same -- but I am cautious about the 32bit / 64bit mix. I have no experience
> | > there.  At work everything is 64bit, at home everything is 32bit.
> | >
> | > | When executing the following command from the master:
> | > |
> | > | --> mpirun --hostfile <some file> -np 1 R CMD BATCH <some program>.R
> | > |
> | > | the following trace results on the master node (lines 18 and 19 are from my
> | > | particular R program):
> | > |
> | > | *** caught segfault ***
> | > | address 0x10333e4d8, cause 'memory not mapped'
> | > |
> | > | Traceback:
> | > |  1: .Call("mpi_recv", x, as.integer(type), as.integer(source),
> | > | as.integer(tag),     as.integer(comm), as.integer(status), PACKAGE = "Rmpi")
> | > |  2: mpi.recv(x = raw(charlen), type = 4, srctag[1], srctag[2], comm,
> | > | status)
> | > |  3: typeof(connection)
> | > |  4: unserialize(obj)
> | > |  5: .mpi.unserialize(mpi.recv(x = raw(charlen), type = 4, srctag[1],
> | > | srctag[2], comm, status))
> | > |  6: mpi.recv.Robj(node$rank, node$RECVTAG, node$comm)
> | > |  7: recvData.MPInode(con)
> | > |  8: recvData(con)
> | > |  9: FUN(X[[6L]], ...)
> | > | 10: lapply(cl[1:jobs], recvResult)
> | > | 11: staticClusterApply(cl, fun, length(x), argfun)
> | > | 12: clusterApply(cl, splitList(x, length(cl)), lapply, fun, ...)
> | > | 13: is.vector(X)
> | > | 14: lapply(args, enquote)
> | > | 15: do.call("fun", lapply(args, enquote))
> | > | 16: docall(c, clusterApply(cl, splitList(x, length(cl)), lapply,     fun,
> | > | ...))
> | > | 17: snow::parLapply(snow::getMPIcluster(), object, FUN, ..., DMetaData =
> | > | DMetaData(object))
> | > | 18: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
> | > | "realestate")
> | > | 19: tmMap(corpuscleanrand, replacePatterns, ("real estate"), by =
> | > | "realestate")
> | > | aborting ...
> | > | Segmentation fault
> | >
> | > In a case like this I always prefer to step back and run simple scripts (as
> | > from my "Intro to HPC with R" tutorials).  E.g. can you run
> | >
> | > a) a simple mpiHelloWorld C program with no other depends between master and
> | >   slave nodes ?  This shows basic MPI functionality.
> | >
> | >   mpiHelloWorld.c is attached. Do
> | >
> | >   $ mpicc -o mpiHelloWorld mpiHelloWorld.c
> | >   $ # cp and scp to /tmp on master and slave
> | >   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld
> | >
> | > b) same for a simple Rmpi script doing the same ?  This shows R/MPI interaction.
> | >
> | >   Likewise, place mpiHelloWorld.r in /tmp on each machine, then
> | >
> | >   $ orterun -n 4 -H master,slave /tmp/mpiHelloWorld.r
> | >
> | > c) do the same for snow (by writing a simple snow/MPI file)
> | >
> | > d) if you care for slurm, do the same with slurm to allocate resource in
> | >   which you then run orterun to launch R/MPI jobs.
> | >
> | > | CONFIGURATION STEPS TAKEN --
> | > |
> | > | - There is no common/shared file system mounted for the cluster.
> | > |
> | > | - All PATH and LD_LIBRARY_PATH environment variables for OpenMPI are
> | > | properly set on each node (including the master).
> | > |
> | > | - OpenMPI was configured and built on each node with the
> | > | --enable-heterogeneous configuration flag to account for the AMD-64 and
> | > | Intel-32 architectures.
> | > |
> | > | - The R_SNOW_LIB environment variable is set properly and the RunSnowNode
> | > | and RunSnowWorker scrips are located in the PATH (and set to executable) on
> | > | all nodes (including the master).
> | > |
> | > | - All of the OpenMPI settings as documented in the OpenMPI FAQs to allow for
> | > | remote execution (i.e. rsh/ssh, .rhosts) are in place.
> | > |
> | > | Any insight or assistance will be greatly appreciated.
> | >
> | > As outlined above, I try to stick with the 'tested' configuration from the
> | > Debian packages so I don't have to deal with all the env vars etc.  Also,
> | > decomposing down from snow to Rmpi to MPI by itself may help.
> | >
> | > Best regards, Dirk
> | >
> | >
> | > |
> | > | Sincerely,
> | > | Mark
> | > |
> | > |       [[alternative HTML version deleted]]
> | > |
> | > | _______________________________________________
> | > | R-sig-hpc mailing list
> | > | R-sig-hpc at r-project.org
> | > | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> | >
> | >
> | >
> | > --
> | > Three out of two people have difficulties with fractions.
> | >
>
> --
> Three out of two people have difficulties with fractions.
>



More information about the R-sig-hpc mailing list