[R-sig-hpc] Antwort: Re: Plain: Problem with Rmpi

Stephen Weston stephen.b.weston at gmail.com
Tue Jan 19 17:11:16 CET 2010


>From looking over the Rmpi Rprofile, it appears that it is used to put the
cluster workers into a "worker loop" or "task loop", allowing you to use
the higher level functions in Rmpi when spawn isn't supported, or if you
choose to start your workers using orterun, perhaps for performance
reasons.  From your message, I see that you are starting the workers
using orterun, since you specified "-n 4".

Rmpi's Rprofile definitely isn't compatible with the doMPI package, since
it puts all of the workers into an Rmpi worker loop, ready to execute
"Rmpi tasks", thus preventing them from executing "doMPI tasks".
In doMPI, the startMPIcluster function plays a very similar role as
.Rprofile in Rmpi.  That is, it puts all of the workers into a doMPI worker
loop, allowing them to execute tasks sent to them from foreach/%dopar%.

Note that doMPI and snow only use the "low level" functions in Rmpi,
and never make use of Rprofile.  They use Rmpi for communication,
not execution purposes.

I hope that explains a bit of what is going on.

- Steve



On Tue, Jan 19, 2010 at 8:58 AM,  <sebastian.rohrer at basf.com> wrote:
> Hao and Stephen,
> thanks a lot. Your comments took me a good step further. But I must
> confess, I am a bit confused about the significance of the .Rprofile
> provided with Rmpi. I'm quite sure this is because I am a total noob
> concerning HPC but maybe you can help me understand the issue.
> After Stephen's response I changed my script to:
>
> require(Rmpi)
>
> print(mpi.remote.exec(rnorm(5)))
>
> mpi.close.Rslaves()
> mpi.quit()
>
> and called it with
>
> /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 Rscript
> mpiTest_03.R
>
> ,which ran without error and procduced the following output:
>
> master (rank 0, comm 1) of size 4 is running on: node02
> slave1 (rank 1, comm 1) of size 4 is running on: node03
> slave2 (rank 2, comm 1) of size 4 is running on: node02
> slave3 (rank 3, comm 1) of size 4 is running on: node03
>          X1         X2           X3
> 1  0.6422312 -1.4176550 -0.864957823
> 2 -0.9049865  1.3221402  0.322550244
> 3  1.1318463 -0.3170188 -0.001224240
> 4  0.8153995  1.4860591 -1.507712241
> 5 -0.1545055  0.3834336 -0.104543321
> [1] 1
>
> So everything seems OK, three slaves are running and therefore
> mpi.remote.exec(rnorm(5)) is executed on three slaves.
> BTW: if I leave out "mpi.close.Rslaves()" the program will produce the
> same output, but then hang indefinitely.
>
> After Hao's response, I removed the .Rprofile from Rmpi from the working
> directory. The sample script above wouldn't run anymore, but this is
> expected if I understood Hao's comment in the right way.
> What really strikes me, is that a test script I prepared for testing
> Stephen's doMPI package (this is my ultimate goal: foreach, iterators and
> doMPI) now ran smoothly:
>
> library(doMPI)
> cl <- startMPIcluster()
> registerDoMPI(cl)
>
> foreach(i = 1:3) %dopar% sqrt(i)
>
> closeCluster(cl)
> mpi.quit()
>
> /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4 Rscript
> mpiTest_04.R
>
> Output:
>
> [[1]]
> [1] 1
>
> [[2]]
> [1] 1.414214
>
> [[3]]
> [1] 1.732051
>
> So this seems to work allright. The same is true for the bootMPI.R example
> provided with doMPI. Both, however, don't work if I restore the Rmpi
> .Rprofile.
>
> I'm sorry to bother you with these newbie questions, but I really would
> like to understand what is going on here.
>
> Thanks a lot for your support and thanks to both of you Hao and Stephen
> for providing these great packages!
>
> Cheers, Sebastian
>
>
>
>
>
>
> "Hao Yu" <hyu at stats.uwo.ca>
> 18.01.2010 19:43
> Bitte antworten an
> hyu at stats.uwo.ca
>
>
> An
> "Stephen Weston" <stephen.b.weston at gmail.com>
> Kopie
> sebastian.rohrer at basf.com, "R-sig-hpc at r-project.org"
> <r-sig-hpc at r-project.org>
> Thema
> Re: [R-sig-hpc] Plain: Problem with Rmpi
>
>
>
>
>
>
> mpi.spawn.Rslaves is not needed since orterun is used to create 1 master
> and 3 slaves.
>
> I assume Rprofile in Rmpi is used. Then all slaves are in infinite loop
> waiting for master instructions. They will not take any R scripts through
> BATCH mode (only master will process them).
>
> You may use mpi.remote.exec to get what you want or modify slave.hostinfo.
>
> Hao
>
>
> Stephen Weston wrote:
>> Why are you calling mpi.close.Rslaves?  I believe that you should
>> only do that if you've started the slaves via mpi.spawn.Rslaves.
>>
>> I'm not sure if that has any bearing on your problem, however.
>> But I wouldn't draw too many conclusions based on not seeing
>> messages to stdout from the slaves right before quitting,
>> especially messages that are produced on different machines.
>> To improve the chances of it working, I suggest that you
>> add a call to flush(stdout()) after the cat.  But you might want
>> to do something else to prove whether the slaves are really
>> running.
>>
>> - Steve
>>
>>
>> On Mon, Jan 18, 2010 at 8:42 AM,  <sebastian.rohrer at basf.com> wrote:
>>> Dear List,
>>>
>>> I have the following problem with R (2.10.0) Rmpi using OpenMPI 1.3.3.:
>>>
>>> I use Dirk Eddelbuettels test script for testing if Rmpi works.
>>> (http://dirk.eddelbuettel.com/papers/ismNov2009introHPCwithR.pdf, the
>>> example is on Slide 96)
>>>
>>> The script looks as follows:
>>>
>>> require(Rmpi)
>>> rk <- mpi.comm.rank(0)
>>> sz <- mpi.comm.size(0)
>>>
>>> name <- mpi.get.processor.name()
>>> cat("Hello, rank", rk, "size", sz, "on", name, "\n")
>>>
>>> mpi.close.Rslaves()
>>> mpi.quit()
>>>
>>> According to Dirk's slides, the output should look something like:
>>> Hello, rank 4 size 8 on ron
>>> Hello, rank 0 size 8 on ron
>>> Hello, rank 3 size 8 on mccoy
>>> Hello, rank 7 size 8 on mccoy
>>> Hello, rank Hello, rank 21 size 8 on joe
>>> size 8 on tony
>>> Hello, rank 6 size 8 on tony
>>> Hello, rank 5 size 8 on joe
>>>
>>> I call the script:
>>>
>>> /programs/openmpi-1.3.3/bin/orterun -host node02,node03 -n 4
>>> /programs/R/R-2.10.0/bin/Rscript mpiTest_03.R
>>>
>>> The output looks like:
>>>
>>> master (rank 0, comm 1) of size 4 is running on: node02
>>> slave1 (rank 1, comm 1) of size 4 is running on: node03
>>> slave2 (rank 2, comm 1) of size 4 is running on: node02
>>> slave3 (rank 3, comm 1) of size 4 is running on: node03
>>> Hello, rank 0 size 4 on node02
>>>
>>> So, in my understanding there is a master and 3 slaves are spawned. But
>>> the call for the processor name is executed only on the master. Right?
>>> Any suggestions on what could be the problem?
>>>
>>> Thanks a lot!
>>>
>>> Sebastian
>>>
>>>
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>>
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>
>
> --
> Department of Statistics & Actuarial Sciences
> Fax Phone#:(519)-661-3813
> The University of Western Ontario
> Office Phone#:(519)-661-3622
> London, Ontario N6A 5B7
> http://www.stats.uwo.ca/faculty/yu
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list