[R-sig-hpc] Error installing Rmpi over OpenMPI: Cannot find orted

Alejandro Gonzalez aleco.gt at gmail.com
Thu Nov 20 19:28:23 CET 2014


Hello again list, thanks for your replies.

I've reinstalled OMPI and Rmpi as you suggested (sudo apt-get install
openmpi-bin
r-cran-rmpi). I've also installed openmpi-common and libopenmpi-dev to have
OMPI working properly again for C and python.

Unfortunately, Rmpi isn't working yet. I've tried different PBS scripts and
R test files, but I'm not sure what I'm doing wrong:
This is my PBS script:
---
#!/bin/bash
#PBS -N R_test
#PBS -l
nodes=laicbio:ppn=32+laicbio1:ppn=12+laicbio2:ppn=12+laicbio3:ppn=12+la$
cd $PBS_O_WORKDIR
Rscript --no-save test.R
---
This is the test.R file (found online)
---
# Load the R MPI package if it is not already loaded.
if (!is.loaded("mpi_initialize")) {
        library("Rmpi")
}
# Spawn as many slaves as possible
mpi.spawn.Rslaves()
# In case R exits unexpectedly, have it automatically clean up
# resources taken up by Rmpi (slaves, memory, etc...)
.Last <- function() {
        if (is.loaded("mpi_initialize")) {
                if (mpi.comm.size(1) > 0) {
                        print("Please use mpi.close.Rslaves() to close
slaves.")
                        mpi.close.Rslaves()
                }
                print("Please use mpi.quit() to quit R")
                .Call("mpi_finalize")
        }
}
# Tell all slaves to return a message identifying themselves
mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size()))
# Tell all slaves to close down, and exit the program
mpi.close.Rslaves()
mpi.quit()
---

It's giving me the following errors:
---
$ cat R_test.e98
[laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: Not found in file
routed_binomial.c at line 386
[laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: A message is attempting to be
sent to a process whose contact information is unknown in file
rml_oob_send.c at line 104
[laicbio:67788] [[32125,0],0] could not get route to [[32125,2],0]
---
And the following output:
---
$ cat R_test.o98
    1 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 2 is running on: laicbio
slave1 (rank 1, comm 1) of size 2 is running on: laicbio
$slave1
[1] "I am 1 of 2"

[1] 1
---

If I add mpiexec before Rscript to the PBS script, the job keeps running
(doesn't finish) and I get lots of empty logs named like
laicbio3.9740+1.10076.log, laicbio3 is one of the working nodes.

May you suggest me a way for testing to track the problem down?

Thanks again.
Alejandro

2014-11-08 10:59 GMT-06:00 Dirk Eddelbuettel <edd at debian.org>:

>
> On 6 November 2014 at 12:21, Alejandro Gonzalez wrote:
> | Hello List, this is my first message but I've been using your help for a
> | while, thank you.
> |
> | I have a cluster of Ubuntu 14.04 machines with OpenMPI and I'm not being
> | able to install Rmpi.
>
> What happens when you try
>
>      sudo apt-get install r-cran-rmpi
>
> as in most cases the pre-built binary will be just fine.
>
> | Here are some more specs of my system:
> | - I installed from sources Torque 4.2.9 and Maui 3.3.1
> | - OpenMPI version is 1.8.2 (I installed this one from source too)
> | - R version is 3.0.2 (This was installed with apt-get install)
> |
> | When I try to install Rmpi:
> | $ sudo R CMD INSTALL Rmpi_0.6-3.tar.gz
> | --configure-args="--with-mpi=/opt/openmpi"
> |
> | I get the following:
> | ---
> | * installing to library ‘/usr/local/lib/R/site-library’
> | * installing *source* package ‘Rmpi’ ...
> | checking for gcc... gcc -std=gnu99
> | checking whether the C compiler works... yes
> | checking for C compiler default output file name... a.out
> | checking for suffix of executables...
> | checking whether we are cross compiling... no
> | checking for suffix of object files... o
> | checking whether we are using the GNU C compiler... yes
> | checking whether gcc -std=gnu99 accepts -g... yes
> | checking for gcc -std=gnu99 option to accept ISO C89... none needed
> | Trying to find mpi.h ...
> | Found in /opt/openmpi/include
> | Trying to find libmpi.so or libmpich.a ...
> | Found libmpi in /opt/openmpi/lib
> | checking for orted... no
> | configure: error: Cannot find orted. Rmpi needs orted to run.
>
> Given that we have an existing Debian (and Ubuntu) package which has been
> built for years, "all" you need to do is to ensure that you too have what
> is
> called the 'Build-Depends' needed to build the package.  Each Debian
> package
> writes these down in their configuration, and here it is (and I wrapped
> lines
> for the email)
>
>     Build-Depends: debhelper (>= 7.0.0), cdbs, \
>          r-base-dev (>= 3.1.0), \
>          mpi-default-dev, mpi-default-bin
>
> where line one just deals with Debian packaging internals, line two ensure
> R
> is present (doh !!) and line three ensures that you have both the binaries
> and headers / libraries for the default MPI implementation on your
> architecture -- which is OpenMPI on most of them (and MPICH on some less
> common architectures).
>
> I do not think this has anything to do with Torque (though I could be
> overlooking something, Ei-ji usually knows very very well what he is
> talking
> about).
>
> But as I said: there is generally no reason to build this from source.
>
> Dirk
>
>
> | ERROR: configuration failed for package ‘Rmpi’
> | * removing ‘/usr/local/lib/R/site-library/Rmpi’
> | ---
> |
> | I've read the Rmpi news,
> |
> http://r.789695.n4.nabble.com/Problem-installing-Rmpi-with-Open-MPI-tt4641762.html#none
> | and http://www.open-mpi.org/community/lists/devel/2012/04/10840.php and
> | then tried to install Rmpi using a new build of OpenMPI, that I
> configured
> | this way:
> | $ ./configure --with-tm=/opt/torque --prefix=/opt/openmpi_disable_dlopen
> | --disable-dlopen
> | But I got the same error (configure: error: Cannot find orted. Rmpi needs
> | orted to run.).
> |
> | Am I doing something wrong? Do you have a clue on how can I install Rmpi?
> | I'd also want to understand more about what does --disable-dlopen mean,
> why
> | it's necessary for Rmpi and what happens when I run other MPI software
> when
> | OpenMPI has been configured with --disable-dlopen. May you share me some
> | reading?
> |
> | Thanks in advance.
> | Alejandro
> |
> |       [[alternative HTML version deleted]]
> |
> | _______________________________________________
> | R-sig-hpc mailing list
> | R-sig-hpc at r-project.org
> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>

	[[alternative HTML version deleted]]



More information about the R-sig-hpc mailing list