[R-sig-hpc] Error installing Rmpi over OpenMPI: Cannot find orted

Alejandro Gonzalez aleco.gt at gmail.com
Fri Nov 21 18:35:00 CET 2014


Hello Ei-ji, thank you for helping me.

I can see libtorque:
---
$ ldd /usr/lib/openmpi/lib/openmpi/mca_plm_tm.so |grep libtorque
libtorque.so.2 => /opt/torque/lib/libtorque.so.2 (0x00007f2a8558e000)
---

I modified my PBS script and R file as you suggested. The job still wasn't
finishing. I was still getting many logs, some of them were empty and some
others not. From the ones that weren't empty I noticed that OpenMPI was
trying to use a different network interface from the one that corresponds
to the cluster network, even when the other network interface was not
connected. I followed the instructions from
https://www.open-mpi.org/faq/?category=tcp#tcp-selection and now RMPI is
working properly! I don't know why I was only getting this errors from R,
not from C or python.

I hope this helps someone in the future.

Thank you very much!
Alejandro




2014-11-20 22:30 GMT-06:00 Ei-ji Nakama <nakama at ki.rim.or.jp>:

> hello
>
>  Because openmpi gets information from torque, mpirun is necessary.
>  When there is not a process in subordinates of mpirun, MPI_Comm_spawn
> starts a process in ssh or rsh.  its process not to know of torque...
>
> please check...
> can you see libtorque?
>
> c.f.
> $ ldd /usr/lib/openmpi/lib/openmpi/mca_plm_tm.so |grep libtorque
> libtorque.so.2 => /usr/lib/libtorque.so.2 (0x00007fd68e189000)
>
> <snip>
> > #!/bin/bash
> > #PBS -N R_test
> > #PBS -l
> > nodes=laicbio:ppn=32+laicbio1:ppn=12+laicbio2:ppn=12+laicbio3:ppn=12+la$
> > cd $PBS_O_WORKDIR
> > Rscript --no-save test.R
>
> c.f.
>   mpirun -np 1 Rscript --no-save test.R
>
> Only a master process starts, with option  `-np 1'
>
> <snip>
> > mpi.spawn.Rslaves()
> <snip>
>
>   mpi.spawn.Rslaves(nslaves=mpi.universe.size()-1)
>
> need to reduce the number of processes for master.
>
>
> > It's giving me the following errors:
> > ---
> > $ cat R_test.e98
> > [laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: Not found in file
> > routed_binomial.c at line 386
> > [laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: A message is attempting to
> be
> > sent to a process whose contact information is unknown in file
> > rml_oob_send.c at line 104
> > [laicbio:67788] [[32125,0],0] could not get route to [[32125,2],0]
> > ---
> > And the following output:
> > ---
> > $ cat R_test.o98
> >     1 slaves are spawned successfully. 0 failed.
> > master (rank 0, comm 1) of size 2 is running on: laicbio
> > slave1 (rank 1, comm 1) of size 2 is running on: laicbio
> > $slave1
> > [1] "I am 1 of 2"
> >
> > [1] 1
> > ---
> >
> > If I add mpiexec before Rscript to the PBS script, the job keeps running
> > (doesn't finish) and I get lots of empty logs named like
> > laicbio3.9740+1.10076.log, laicbio3 is one of the working nodes.
> >
> > May you suggest me a way for testing to track the problem down?
> >
> > Thanks again.
> > Alejandro
> >
> > 2014-11-08 10:59 GMT-06:00 Dirk Eddelbuettel <edd at debian.org>:
> >
> >>
> >> On 6 November 2014 at 12:21, Alejandro Gonzalez wrote:
> >> | Hello List, this is my first message but I've been using your help
> for a
> >> | while, thank you.
> >> |
> >> | I have a cluster of Ubuntu 14.04 machines with OpenMPI and I'm not
> being
> >> | able to install Rmpi.
> >>
> >> What happens when you try
> >>
> >>      sudo apt-get install r-cran-rmpi
> >>
> >> as in most cases the pre-built binary will be just fine.
> >>
> >> | Here are some more specs of my system:
> >> | - I installed from sources Torque 4.2.9 and Maui 3.3.1
> >> | - OpenMPI version is 1.8.2 (I installed this one from source too)
> >> | - R version is 3.0.2 (This was installed with apt-get install)
> >> |
> >> | When I try to install Rmpi:
> >> | $ sudo R CMD INSTALL Rmpi_0.6-3.tar.gz
> >> | --configure-args="--with-mpi=/opt/openmpi"
> >> |
> >> | I get the following:
> >> | ---
> >> | * installing to library '/usr/local/lib/R/site-library'
> >> | * installing *source* package 'Rmpi' ...
> >> | checking for gcc... gcc -std=gnu99
> >> | checking whether the C compiler works... yes
> >> | checking for C compiler default output file name... a.out
> >> | checking for suffix of executables...
> >> | checking whether we are cross compiling... no
> >> | checking for suffix of object files... o
> >> | checking whether we are using the GNU C compiler... yes
> >> | checking whether gcc -std=gnu99 accepts -g... yes
> >> | checking for gcc -std=gnu99 option to accept ISO C89... none needed
> >> | Trying to find mpi.h ...
> >> | Found in /opt/openmpi/include
> >> | Trying to find libmpi.so or libmpich.a ...
> >> | Found libmpi in /opt/openmpi/lib
> >> | checking for orted... no
> >> | configure: error: Cannot find orted. Rmpi needs orted to run.
> >>
> >> Given that we have an existing Debian (and Ubuntu) package which has
> been
> >> built for years, "all" you need to do is to ensure that you too have
> what
> >> is
> >> called the 'Build-Depends' needed to build the package.  Each Debian
> >> package
> >> writes these down in their configuration, and here it is (and I wrapped
> >> lines
> >> for the email)
> >>
> >>     Build-Depends: debhelper (>= 7.0.0), cdbs, \
> >>          r-base-dev (>= 3.1.0), \
> >>          mpi-default-dev, mpi-default-bin
> >>
> >> where line one just deals with Debian packaging internals, line two
> ensure
> >> R
> >> is present (doh !!) and line three ensures that you have both the
> binaries
> >> and headers / libraries for the default MPI implementation on your
> >> architecture -- which is OpenMPI on most of them (and MPICH on some less
> >> common architectures).
> >>
> >> I do not think this has anything to do with Torque (though I could be
> >> overlooking something, Ei-ji usually knows very very well what he is
> >> talking
> >> about).
> >>
> >> But as I said: there is generally no reason to build this from source.
> >>
> >> Dirk
> >>
> >>
> >> | ERROR: configuration failed for package 'Rmpi'
> >> | * removing '/usr/local/lib/R/site-library/Rmpi'
> >> | ---
> >> |
> >> | I've read the Rmpi news,
> >> |
> >>
> http://r.789695.n4.nabble.com/Problem-installing-Rmpi-with-Open-MPI-tt4641762.html#none
> >> | and http://www.open-mpi.org/community/lists/devel/2012/04/10840.php
> and
> >> | then tried to install Rmpi using a new build of OpenMPI, that I
> >> configured
> >> | this way:
> >> | $ ./configure --with-tm=/opt/torque
> --prefix=/opt/openmpi_disable_dlopen
> >> | --disable-dlopen
> >> | But I got the same error (configure: error: Cannot find orted. Rmpi
> needs
> >> | orted to run.).
> >> |
> >> | Am I doing something wrong? Do you have a clue on how can I install
> Rmpi?
> >> | I'd also want to understand more about what does --disable-dlopen
> mean,
> >> why
> >> | it's necessary for Rmpi and what happens when I run other MPI software
> >> when
> >> | OpenMPI has been configured with --disable-dlopen. May you share me
> some
> >> | reading?
> >> |
> >> | Thanks in advance.
> >> | Alejandro
> >> |
> >> |       [[alternative HTML version deleted]]
> >> |
> >> | _______________________________________________
> >> | R-sig-hpc mailing list
> >> | R-sig-hpc at r-project.org
> >> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> >>
> >> --
> >> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
> >>
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-hpc mailing list
> > R-sig-hpc at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> Best Regards,
> --
> Eiji NAKAMA <nakama (a) ki.rim.or.jp>
> "\u4e2d\u9593\u6804\u6cbb"  <nakama (a) ki.rim.or.jp>
>

	[[alternative HTML version deleted]]



More information about the R-sig-hpc mailing list