[R-sig-hpc] Rmpi spawning across nodes.

Stephen Weston stephen.b.weston at gmail.com
Mon Apr 9 22:55:55 CEST 2012


Hi Ben,

What machines are listed when you execute:

  cat $PBS_NODEFILE

in your batch script?  Is it definitely four different nodes?

- Steve


On Mon, Apr 9, 2012 at 3:28 PM, Ben Weinstein
<bweinste at life.bio.sunysb.edu> wrote:
> Hi Stephen,
>
> I've tried to follow your answer, but i'm still getting the same results.
> the heart of my qsub looks like:
>
> mpirun -hostfile $PBS_NODEFILE -np 1 R --slave -f
> /nfs/user08/bw4sz/Files/Seawulf.R
>
>
> Before i run the foreach statement, i ask what node am i on?
> [1] "Original Node wulfie121"
>
> I make sure the open MPI library is there.
> [1] "/usr/local/pkg/openmpi-1.4.4/lib/"
>
> I make the cluster and ask how many slaves were spawn
> 4 slaves are spawned successfully. 0 failed.
>
> Then i ask what are the nodenames of each of my slaves. I believe that if
> this is working correctly, each of the nodenames should be different, since
> i specified #PBS -l nodes=4:ppn=1
>
> However, all the slaves still spawn on that one node.
> [[1]]
>    nodename     machine
> "wulfie121"    "x86_64"
>
> [[2]]
>    nodename     machine
> "wulfie121"    "x86_64"
>
> [[3]]
>    nodename     machine
> "wulfie121"    "x86_64"
>
> [[4]]
>    nodename     machine
> "wulfie121"    "x86_64"
>
> Finally, i'm testing how long the process takes to see if i'm actually
> getting parrelization.
> [1] 4
>    user  system elapsed
>  17.650  39.990 159.632
>
> Again, the heart of the code looks like
>
> cl <- makeCluster(4, type = "MPI")
> print(clusterCall(cl,function() Sys.info()[c("nodename","machine")]))
> registerDoSNOW(cl)
> print(getDoParWorkers())
> system.time(five.ten <- rbind.fill(foreach(j=1:times ) %dopar%
> drop.shuffle(j,iterations)))
> stopCluster(cl)
>
> I am about to change over to a different parralel backend as suggested, but
> i doubt that is the root of the problem in this case.
>
>
> I appreciate the continued help,
>
> Ben Weinstein
>
> On Thu, Mar 29, 2012 at 2:56 PM, Stephen Weston <stephen.b.weston at gmail.com>
> wrote:
>>
>> Hi Ben,
>>
>> You have to run R via mpirun, otherwise all of the workers start
>> on the one node.
>>
>> > I have tried using mpirun -np 4 in front of the R - call, but this just
>> > fails without message.
>>
>> You have to use '-np 1', otherwise your script will be executed
>> by mpirun four times, each trying to spawn four workers.
>> I'm not sure if that explains failing without a message, however.
>>
>> Try something like this:
>>
>> #!/bin/bash
>> #PBS -o 'qsub.out'
>> #PBS -e 'qsub.err'
>> #PBS -l nodes=4:ppn=1
>> #PBS -m bea
>> cat $PBS_NODEFILE
>> hostname
>>
>> cd $PBS_O_WORKDIR
>>
>> # Run an R script
>> mpirun -hostfile $PBS_NODEFILE -np 1 R --slave -f
>> /nfs/user08/bw4sz/Files/Seawulf.R
>>
>> You may not need to use '-hostfile $PBS_NODEFILE', depending on
>> how your Open MPI was built, but I don't think if ever hurts, and
>> it may be required for your installation.
>>
>> - Steve
>
>
>
>
> --
> Ben Weinstein
> Graduate Student
> Ecology and Evolution
> Stony Brook University
>
> http://life.bio.sunysb.edu/~bweinste/index.html
>



More information about the R-sig-hpc mailing list