[R-sig-hpc] simple question on R/Rmpi/snow/slurm configuration

Whit Armstrong armstrong.whit at gmail.com
Mon Jan 5 22:04:14 CET 2009


Thanks, Martin.

I am able to load the Rmpi package when I run the command you suggest.
However, when I call makeMPIcluster, the object returned is always
null:

[warmstrong at linuxsvr ~]$ salloc -n 8 orterun -np 1 R --vanilla
salloc: Granted job allocation 84

R version 2.8.0 (2008-10-20)
Copyright (C) 2008 The R Foundation for Statistical Computing
...
...
> library(Rmpi)
library(Rmpi)
[linuxsvr.kls.corp:09097] mca: base: component_find: unable to open
osc pt2pt: file not found (ignored)
> library(snow)
library(snow)
>  cl <- getMPIcluster()
 cl <- getMPIcluster()
> cl
cl
NULL
>  mpi.universe.size()
 mpi.universe.size()
[1] 8
>

Any suggestions?

Thanks,
Whit


On Mon, Jan 5, 2009 at 3:46 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
> "Whit Armstrong" <armstrong.whit at gmail.com> writes:
>
>> I'm attempting to get Dirk's example from the "intro to HCP with R"
>> talk working (http://dirk.eddelbuettel.com/papers/bocDec2008introHPCwithR.pdf).
>>
>> I have slurm working correctly (all the trivial hostname examples
>> complete successfully).
>>
>> I fire up an R sesssion w/ the following command
>>
>> salloc orterun -n 7 R --vanilla
>
> I think you want to salloc your universe, and then run R on one node
> of the universe
>
> salloc -n 7 orterun -np 1 R --vanilla
>
> then
>
>> library(Rmpi)
>> mpi.universe.size()
>
> will report 7.
>
> Martin
>
>> and then run
>> suppressMessages(library(Rmpi))
>>
>> but my console never returns control.
>>
>> it's just frozen until I <control-c> out of it at which point I get
>> this message:
>>> suppressMessages(library(Rmpi))
>> [linuxsvr.kls.corp:05875] mca: base: component_find: unable to open
>> osc pt2pt: file not found (ignored)
>> orterun: killing job...
>>
>> orterun noticed that job rank 0 with PID 5875 on node node0 exited on
>> signal 15 (Terminated).
>> salloc: Relinquishing job allocation 70
>> [warmstrong at linuxsvr ~]$
>>
>> meanwhile squeue shows:
>>
>> [warmstrong at linuxsvr ~]$ squeue
>>   JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
>>      71      prod  orterun warmstro   R       0:31      1 node0
>> [warmstrong at linuxsvr ~]$
>>
>>
>> Have I missed something crucial?  Should I only be running these
>> examples in batch mode or with littler?
>>
>> Thanks in advance,
>> Whit
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>



More information about the R-sig-hpc mailing list