[R-sig-hpc] Rmpi and mpirun
vincent@boucher@u @ending from gm@il@com
Fri Jul 13 15:06:02 CEST 2018
thanks for the suggestion. Nothing I could find informative in the debug
output (or at least, that I could interpret). However, it made me think of
playing around with the options a bit... I think I found the issue (i.e. it
works at the moment!). I'm not sure why, but I post it here in case someone
has a similar issue.
I didn't discuss the setup, but I have a beowulf type cluster (a bunch of
old computers linked through Ethernet cables). The thing is that open mpi
first tries to find infiniband connections first (which obviously, I don't
have). Normally this is not an issue (at least it isn't for my Fortran90
codes) but it seems to screw with Rmpi... running the -mca btl ^openib as
in : mpirun -n 1 -mca btl ^openib --hostfile hostfile.txt R --interactive
Solves the problem...
It is weird since R, or mpirun, does not really issue any error related...
anyway, many thanks !
On Fri, Jul 13, 2018 at 1:17 AM Ei-ji Nakama <nakama using ki.rim.or.jp> wrote:
> When you debug the OpenMPI process...
> Read the result of the following command
> $ ompi_info --param btl base --level 9
> Maybe first time...try following command
> $ mpirun --mca btl_base_verbose 40 -np 1 R --interactive
> ----<write script>----
> Debugging parameter file can also be written below
> $ mkdir -p ~/.openmpi
> $ echo "btl_base_verbose = 40" > ~/.openmpi/mca-params.conf
> 2018-07-12 5:31 GMT+09:00 Vincent Boucher <vincent.boucher.u using gmail.com>:
> > Hi,
> > I'm running into an issue using Rmpi with Open MPI on a beowulf cluster.
> > The installation of the package went without any issue. I have done the
> > following:
> > 'mpirun -n 1 --hostfile hostfile.txt R --interactive'
> > then, 'library(Rmpi)
> > when I do 'ns <- mpi.universe.size()' I get ns=12, which is what it is
> > supposed to be. However, 'mpi.spawn.Rslaves(nslaves=ns)' fails and I get
> Since you have already started the MPI master process,
> `mpi.universe.size() - 1'
> will be the number of slaves that can be activated.
> > the "not enough slots available..." message.
> > It looks like when R opens, the nodes are already up and running (due to
> > the mpirun) so mpi.spawn fails... I've tried to launch R directly
> > mpirun) but then, I only get 1 node...
> See below, orte_default_hostfile
> ompi_info --params orte all --level 9
> $ echo 'orte_default_hostfile = "~/hostfile.txt"' >>
> For the host file format, refer to the following.
> > Am I missing something?
> > many thanks
> > Vincent
> > [[alternative HTML version deleted]]
> > _______________________________________________
> > R-sig-hpc mailing list
> > R-sig-hpc using r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> Best Regards,
> Eiji NAKAMA <nakama (a) ki.rim.or.jp>
> "\u4e2d\u9593\u6804\u6cbb" <nakama (a) ki.rim.or.jp>
[[alternative HTML version deleted]]
More information about the R-sig-hpc