[R-sig-Debian] rmpi/snow grabs all available CPU

Manuel Prinz debian at pinguinkiste.de
Fri May 9 11:19:40 CEST 2008


[ Please CC me on list replies, I'm not subscribed ]

Am Donnerstag, den 08.05.2008, 11:19 -0500 schrieb Dirk Eddelbuettel:
> I think what you see is actually Open MPI which polls all cpus all the time
> _on purpose_.  I am a bit hazy on the details but IIRC this may change in the
> upcoming 1.3 release.  CC'ing Manuel who may remember better than I do.

This issue usually comes up every now and then. Dirk, you're right, it
is indeed on purpose. One of the arguments is that OpenMPI is "optimized
for minimum message passing latency on hosts that are not
oversubscribed" (list quote) and polling is quite good at that. Blocking
simply isn't, and one usually runs the CPU at 100% when doing heavy
computations. (As a side note: From a user perspective, this has never
been an issue for me, since my jobs are very computationally expensive.)

For those interested, there's a longer discussion of that in the thread
starting here:
http://www.open-mpi.org/community/lists/users/2008/04/5457.php

Faheem, I can just second what Dirk expressed in on of his emails: If
possible, do not fall back on LAM. It's no longer maintained and
probably buggy in some cases. You have two options with OpenMPI
nevertheless:

1. You can try the patch mentioned in
http://www.open-mpi.org/community/lists/users/2008/04/5481.php. It's not
a real blocking strategy but may work. (Or may not, I did not try it.)

2. The OpenMPI people are working on a blocking strategy, but it's very
low priority and to the best of my knowledge not implemented yet. You
could wait for it to happen or maybe help with the implementation.

Altough there seems to be no good solution, it's IMHO still better to
waste cycles for now than falling back to LAM. Just my 2 cent.

Best regards
Manuel



More information about the R-SIG-Debian mailing list