[R-sig-hpc] Rmpi and cpu usage on slaves

Hao Yu hyu at stats.uwo.ca
Wed Apr 22 19:10:50 CEST 2009


As Dirk said, it is a feature of OpenMPI. LAM-MPI doesn't have this issue.
I don't think there is a solution on slave sides since mpi.bcast is a
blocking call. It might be possible to use nonblocking point-to-point
calls such as mpi.ireiv with Sys.sleep command but the whole-slave
communications must be rewritten. If Dirk is correct, future release of
openmpi will remove such a feature. This is why I did not try to work out
a solution, at least on slave sides. In real computation, all slaves are
supposed to use up all assigned cpu cycles.

The same issue will be applied to master as well if any of parallel apply
functions are used. In Rmpi 0.4-7 several nonblock parallel apply
functions are added so master will not consume 100%cpu while waiting.

So far LAM-MPI is still the best environment for programing, debugging and
testing.

Hao


Dirk Eddelbuettel wrote:
>
> On 21 April 2009 at 16:40, Sean Davis wrote:
> | I am running sge6.2, openmpi 1.3.1, and Rmpi 0.5.7 on openSUSE linux.  I
> can
> | start up an arbitrarily-sized cluster using sge, see the appropriate
> | universe.size using Rmpi, and start a cluster using mpi.spawn.Rslaves().
> | However, it appears that all the slaves then run at 100% cpu on all
> nodes.
> | Even using Rmpi under openmpi with a simple hostfile produces the same
> | result.  Any suggestions to figure out what is going on on the slaves?
>
> There is a known issue with Open MPI and blocking which you may be hitting
> here.  Upstream Open MPI considers it a feature. But as this has come up a
> few times on their mailing list as well, I believe the last word was that
> it
> will go away in a future release.
>
> Hth, Dirk
>
> | Thanks,
> | Sean
> |
> |
> | > library(Rmpi)
> | library(Rmpi)
> | > mpi.universe.size()
> | mpi.universe.size()
> | [1] 24
> | > mpi.spawn.Rslaves()
> | mpi.spawn.Rslaves()
> |         24 slaves are spawned successfully. 0 failed.
> | master  (rank 0 , comm 1) of size 25 is running on: Mahfouz
> | slave1  (rank 1 , comm 1) of size 25 is running on: Mahfouz
> | slave2  (rank 2 , comm 1) of size 25 is running on: Mahfouz
> | slave3  (rank 3 , comm 1) of size 25 is running on: Mahfouz
> | slave4  (rank 4 , comm 1) of size 25 is running on: Mahfouz
> | slave5  (rank 5 , comm 1) of size 25 is running on: Mahfouz
> | slave6  (rank 6 , comm 1) of size 25 is running on: Mahfouz
> | slave7  (rank 7 , comm 1) of size 25 is running on: Mahfouz
> | slave8  (rank 8 , comm 1) of size 25 is running on: Grass
> | slave9  (rank 9 , comm 1) of size 25 is running on: Grass
> | slave10 (rank 10, comm 1) of size 25 is running on: Grass
> | slave11 (rank 11, comm 1) of size 25 is running on: Grass
> | slave12 (rank 12, comm 1) of size 25 is running on: Grass
> | slave13 (rank 13, comm 1) of size 25 is running on: Grass
> | slave14 (rank 14, comm 1) of size 25 is running on: Grass
> | slave15 (rank 15, comm 1) of size 25 is running on: Grass
> | slave16 (rank 16, comm 1) of size 25 is running on: shakespeare
> | slave17 (rank 17, comm 1) of size 25 is running on: shakespeare
> | slave18 (rank 18, comm 1) of size 25 is running on: shakespeare
> | slave19 (rank 19, comm 1) of size 25 is running on: shakespeare
> | slave20 (rank 20, comm 1) of size 25 is running on: shakespeare
> | slave21 (rank 21, comm 1) of size 25 is running on: shakespeare
> | slave22 (rank 22, comm 1) of size 25 is running on: shakespeare
> | slave23 (rank 23, comm 1) of size 25 is running on: shakespeare
> | slave24 (rank 24, comm 1) of size 25 is running on: Mahfouz
> | > mpi.close.Rslaves()
> | mpi.close.Rslaves()
> | [1] 1
> |
> | > sessionInfo()    # on the master
> | R version 2.9.0 Under development (unstable) (2009-02-21 r47969)
> | x86_64-unknown-linux-gnu
> |
> | locale:
> |
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> |
> | attached base packages:
> | [1] stats     graphics  grDevices utils     datasets  methods   base
> |
> | other attached packages:
> | [1] Rmpi_0.5-7
> |
> | 	[[alternative HTML version deleted]]
> |
> | _______________________________________________
> | R-sig-hpc mailing list
> | R-sig-hpc at r-project.org
> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> --
> Three out of two people have difficulties with fractions.
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>


-- 
Department of Statistics & Actuarial Sciences
Fax Phone#:(519)-661-3813
The University of Western Ontario
Office Phone#:(519)-661-3622
London, Ontario N6A 5B7
http://www.stats.uwo.ca/faculty/yu



More information about the R-sig-hpc mailing list