[R-sig-hpc] Difference between PSOCK and MPI?
mtmorgan at fhcrc.org
Thu Apr 11 23:06:49 CEST 2013
On 04/11/2013 01:43 PM, Simon Urbanek wrote:
> On Apr 11, 2013, at 2:23 PM, Marius Hofert wrote:
>> Dirk Eddelbuettel <edd at debian.org> writes:
>>> On 9 April 2013 at 23:24, Marius Hofert wrote:
>>> | What are the main differences (advantages/drawbacks) between parallel's makeCluster(,
>>> | type="PSOCK") and makeCluster(, type="MPI")? According to
>>> | http://cran.r-project.org/web/views/HighPerformanceComputing.html, MPI has
>>> | become the 'standard', although the default type of makeCluster() is "PSOCK". Is
>>> | "PSOCK" more compatible in that it does not need an installation such as
>>> | (open)MPI? slower/faster/no difference?
>>> You don't want PSOCK.
>>> It is the lowest common demoninator which even works (for various definitions
>>> of "work") on Windoze.
>> Hi Dirk,
>> thanks for helping. Indeed, we don't want that. The reason I was asking for is
>> the following: I am writing a package (jointly with Martin) where we make use of
>> makeCluster(). The 'natural' default would be to use the default of the
>> 'type' argument ("PSOCK") and leave it up to the user to decide if he wants to replace
>> it by "MPI". On the other hand, one could make "MPI" the default advocating
>> 'good practice'. Since we are unsure, I thought I ask about the differences
>> between the two to get a better feeling for what a good default would be.
> One thing to consider for a package is that MPI is probably not available in the majority of cases. In fact, MPI back-end is not even available in parallel by default (it routes to "snow" strangely ...). Although PSOCK is not the best performant back-end it is reasonably easy to setup so I would certainly keep it as default. If someone knows how to setup MPI then they do know how to set the optional argument, not the other way around ;).
Might as well throw my opinion into the ring, too...
(a) on single non-Windows machines there are substantial benefits (e.g., shared
memory) to be had using multiple cores on a single machine, and for these you
and your users would much rather use the mc* guys to fork rather than spawning
processes via PSOCK or MPI.
(b) on clusters, it is almost certain that MPI is available / there are people
who can make it available, and one would rather use the Rmpi package (for
subtler reasons, e.g., bcast (log N) rather than sequential communication of
data) rather than snow-like in parallel. But also most clusters require
mastering some kind of batch submission, and many 'ordinary' users will be
overwhelmed and end up submitting independent jobs whose results they then
stitch together in some ad hoc way; the BatchJobs package seems [no direct
experience] to help people come to some kind of intermediate ground.
(c) on single-machine Windows one might as well use PSOCK; the MPI installation
issues are likely to be daunting. I don't know that Windows clusters are common
>>> As your mail address reveals that you are coming from a serious place,
>>> you should look into MPI.
>> we always used it, that's why it became the (internal) default so far.
>>> Or just use N core machines, where N is a big as your grant allows.
>> ETH Zurich
>> Dr. Marius Hofert
>> RiskLab, Department of Mathematics
>> HG E 65.2
>> Rämistrasse 101
>> 8092 Zurich
>> Phone +41 44 632 2423
>> GPG key fingerprint 8EF4 5842 0EA2 5E1D 3D7F 0E34 AD4C 566E 655F 3F7C
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the R-sig-hpc