[R-sig-hpc] simple question on R/Rmpi/snow/slurm configuration

Martin Morgan mtmorgan at fhcrc.org
Mon Jan 5 22:29:49 CET 2009


Whit Armstrong wrote:
> Thanks, Martin.
> 
> I am able to load the Rmpi package when I run the command you suggest.
> However, when I call makeMPIcluster, the object returned is always
> null:

provide makeMPIcluster with an argument 'count' to indicate how many 
nodes to launch, makeMPIcluster(7).

I think makeMPIcluster() is looking at mpi.comm.size to determine how 
many nodes to launch, instead of mpi.universe.size().

A caveat, maybe others will chime in -- I don't usually use slurm or 
snow, so don't have a lot of experience with the specifics of this setup.

Martin

> [warmstrong at linuxsvr ~]$ salloc -n 8 orterun -np 1 R --vanilla
> salloc: Granted job allocation 84
> 
> R version 2.8.0 (2008-10-20)
> Copyright (C) 2008 The R Foundation for Statistical Computing
> ...
> ...
>> library(Rmpi)
> library(Rmpi)
> [linuxsvr.kls.corp:09097] mca: base: component_find: unable to open
> osc pt2pt: file not found (ignored)
>> library(snow)
> library(snow)
>>  cl <- getMPIcluster()
>  cl <- getMPIcluster()
>> cl
> cl
> NULL
>>  mpi.universe.size()
>  mpi.universe.size()
> [1] 8
> 
> Any suggestions?
> 
> Thanks,
> Whit
> 
> 
> On Mon, Jan 5, 2009 at 3:46 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
>> "Whit Armstrong" <armstrong.whit at gmail.com> writes:
>>
>>> I'm attempting to get Dirk's example from the "intro to HCP with R"
>>> talk working (http://dirk.eddelbuettel.com/papers/bocDec2008introHPCwithR.pdf).
>>>
>>> I have slurm working correctly (all the trivial hostname examples
>>> complete successfully).
>>>
>>> I fire up an R sesssion w/ the following command
>>>
>>> salloc orterun -n 7 R --vanilla
>> I think you want to salloc your universe, and then run R on one node
>> of the universe
>>
>> salloc -n 7 orterun -np 1 R --vanilla
>>
>> then
>>
>>> library(Rmpi)
>>> mpi.universe.size()
>> will report 7.
>>
>> Martin
>>
>>> and then run
>>> suppressMessages(library(Rmpi))
>>>
>>> but my console never returns control.
>>>
>>> it's just frozen until I <control-c> out of it at which point I get
>>> this message:
>>>> suppressMessages(library(Rmpi))
>>> [linuxsvr.kls.corp:05875] mca: base: component_find: unable to open
>>> osc pt2pt: file not found (ignored)
>>> orterun: killing job...
>>>
>>> orterun noticed that job rank 0 with PID 5875 on node node0 exited on
>>> signal 15 (Terminated).
>>> salloc: Relinquishing job allocation 70
>>> [warmstrong at linuxsvr ~]$
>>>
>>> meanwhile squeue shows:
>>>
>>> [warmstrong at linuxsvr ~]$ squeue
>>>   JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
>>>      71      prod  orterun warmstro   R       0:31      1 node0
>>> [warmstrong at linuxsvr ~]$
>>>
>>>
>>> Have I missed something crucial?  Should I only be running these
>>> examples in batch mode or with littler?
>>>
>>> Thanks in advance,
>>> Whit
>>>
>>> _______________________________________________
>>> R-sig-hpc mailing list
>>> R-sig-hpc at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M2 B169
>> Phone: (206) 667-2793
>>


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-sig-hpc mailing list