[R-sig-hpc] Error installing Rmpi over OpenMPI: Cannot find orted

Ei-ji Nakama nakama at ki.rim.or.jp
Fri Nov 21 05:30:15 CET 2014


hello

 Because openmpi gets information from torque, mpirun is necessary.
 When there is not a process in subordinates of mpirun, MPI_Comm_spawn
starts a process in ssh or rsh.  its process not to know of torque...

please check...
can you see libtorque?

c.f.
$ ldd /usr/lib/openmpi/lib/openmpi/mca_plm_tm.so |grep libtorque
libtorque.so.2 => /usr/lib/libtorque.so.2 (0x00007fd68e189000)

<snip>
> #!/bin/bash
> #PBS -N R_test
> #PBS -l
> nodes=laicbio:ppn=32+laicbio1:ppn=12+laicbio2:ppn=12+laicbio3:ppn=12+la$
> cd $PBS_O_WORKDIR
> Rscript --no-save test.R

c.f.
  mpirun -np 1 Rscript --no-save test.R

Only a master process starts, with option  `-np 1'

<snip>
> mpi.spawn.Rslaves()
<snip>

  mpi.spawn.Rslaves(nslaves=mpi.universe.size()-1)

need to reduce the number of processes for master.


> It's giving me the following errors:
> ---
> $ cat R_test.e98
> [laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: Not found in file
> routed_binomial.c at line 386
> [laicbio:67788] [[32125,0],0] ORTE_ERROR_LOG: A message is attempting to be
> sent to a process whose contact information is unknown in file
> rml_oob_send.c at line 104
> [laicbio:67788] [[32125,0],0] could not get route to [[32125,2],0]
> ---
> And the following output:
> ---
> $ cat R_test.o98
>     1 slaves are spawned successfully. 0 failed.
> master (rank 0, comm 1) of size 2 is running on: laicbio
> slave1 (rank 1, comm 1) of size 2 is running on: laicbio
> $slave1
> [1] "I am 1 of 2"
>
> [1] 1
> ---
>
> If I add mpiexec before Rscript to the PBS script, the job keeps running
> (doesn't finish) and I get lots of empty logs named like
> laicbio3.9740+1.10076.log, laicbio3 is one of the working nodes.
>
> May you suggest me a way for testing to track the problem down?
>
> Thanks again.
> Alejandro
>
> 2014-11-08 10:59 GMT-06:00 Dirk Eddelbuettel <edd at debian.org>:
>
>>
>> On 6 November 2014 at 12:21, Alejandro Gonzalez wrote:
>> | Hello List, this is my first message but I've been using your help for a
>> | while, thank you.
>> |
>> | I have a cluster of Ubuntu 14.04 machines with OpenMPI and I'm not being
>> | able to install Rmpi.
>>
>> What happens when you try
>>
>>      sudo apt-get install r-cran-rmpi
>>
>> as in most cases the pre-built binary will be just fine.
>>
>> | Here are some more specs of my system:
>> | - I installed from sources Torque 4.2.9 and Maui 3.3.1
>> | - OpenMPI version is 1.8.2 (I installed this one from source too)
>> | - R version is 3.0.2 (This was installed with apt-get install)
>> |
>> | When I try to install Rmpi:
>> | $ sudo R CMD INSTALL Rmpi_0.6-3.tar.gz
>> | --configure-args="--with-mpi=/opt/openmpi"
>> |
>> | I get the following:
>> | ---
>> | * installing to library '/usr/local/lib/R/site-library'
>> | * installing *source* package 'Rmpi' ...
>> | checking for gcc... gcc -std=gnu99
>> | checking whether the C compiler works... yes
>> | checking for C compiler default output file name... a.out
>> | checking for suffix of executables...
>> | checking whether we are cross compiling... no
>> | checking for suffix of object files... o
>> | checking whether we are using the GNU C compiler... yes
>> | checking whether gcc -std=gnu99 accepts -g... yes
>> | checking for gcc -std=gnu99 option to accept ISO C89... none needed
>> | Trying to find mpi.h ...
>> | Found in /opt/openmpi/include
>> | Trying to find libmpi.so or libmpich.a ...
>> | Found libmpi in /opt/openmpi/lib
>> | checking for orted... no
>> | configure: error: Cannot find orted. Rmpi needs orted to run.
>>
>> Given that we have an existing Debian (and Ubuntu) package which has been
>> built for years, "all" you need to do is to ensure that you too have what
>> is
>> called the 'Build-Depends' needed to build the package.  Each Debian
>> package
>> writes these down in their configuration, and here it is (and I wrapped
>> lines
>> for the email)
>>
>>     Build-Depends: debhelper (>= 7.0.0), cdbs, \
>>          r-base-dev (>= 3.1.0), \
>>          mpi-default-dev, mpi-default-bin
>>
>> where line one just deals with Debian packaging internals, line two ensure
>> R
>> is present (doh !!) and line three ensures that you have both the binaries
>> and headers / libraries for the default MPI implementation on your
>> architecture -- which is OpenMPI on most of them (and MPICH on some less
>> common architectures).
>>
>> I do not think this has anything to do with Torque (though I could be
>> overlooking something, Ei-ji usually knows very very well what he is
>> talking
>> about).
>>
>> But as I said: there is generally no reason to build this from source.
>>
>> Dirk
>>
>>
>> | ERROR: configuration failed for package 'Rmpi'
>> | * removing '/usr/local/lib/R/site-library/Rmpi'
>> | ---
>> |
>> | I've read the Rmpi news,
>> |
>> http://r.789695.n4.nabble.com/Problem-installing-Rmpi-with-Open-MPI-tt4641762.html#none
>> | and http://www.open-mpi.org/community/lists/devel/2012/04/10840.php and
>> | then tried to install Rmpi using a new build of OpenMPI, that I
>> configured
>> | this way:
>> | $ ./configure --with-tm=/opt/torque --prefix=/opt/openmpi_disable_dlopen
>> | --disable-dlopen
>> | But I got the same error (configure: error: Cannot find orted. Rmpi needs
>> | orted to run.).
>> |
>> | Am I doing something wrong? Do you have a clue on how can I install Rmpi?
>> | I'd also want to understand more about what does --disable-dlopen mean,
>> why
>> | it's necessary for Rmpi and what happens when I run other MPI software
>> when
>> | OpenMPI has been configured with --disable-dlopen. May you share me some
>> | reading?
>> |
>> | Thanks in advance.
>> | Alejandro
>> |
>> |       [[alternative HTML version deleted]]
>> |
>> | _______________________________________________
>> | R-sig-hpc mailing list
>> | R-sig-hpc at r-project.org
>> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>> --
>> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
>>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

Best Regards,
--
Eiji NAKAMA <nakama (a) ki.rim.or.jp>
"\u4e2d\u9593\u6804\u6cbb"  <nakama (a) ki.rim.or.jp>



More information about the R-sig-hpc mailing list