[R-sig-hpc] mpirun and R

Sat Jun 2 18:46:32 CEST 2012

Steve:

It was built with OpenBLAS, but does that matter with an MPI-based
function (i.e. I thought GotoBLAS was an entirely different hpc aspect
that is only used for linear algebra routines) -- but yes, all the
spawned R processes end up spawning on a single cpu, but if I use
mpirun it functions properly.  I had to "roll" OpenBLAS myself on this
system, because it only has Intel MKL installed by the admins which I
have yet to get to play right with R.  OpenBLAS does work for LA
commands tho.

# In fact, running this on the normal spawned R uses all cores:
a = matrix(rnorm(5000*5000), 5000, 5000)
b = matrix(rnorm(5000*5000), 5000, 5000)
c = a%*%b

#But then in the same instance running:
require(raster)
beginCluster()
# Only spawns on one core.

Are there "better" parameters I might pass to snow to get this
working?  I get the same behavior in snowfall and sfInit():

require(snowfall)
sfInit(parallel=TRUE,cpus=12)
sfStop()
# All spawns execute on a single CPU
sfInit(parallel=TRUE,cpus=12,type="MPI")
sfStop()
# All spawns execute on a single CPU

Incidentally (and I don't consider this a perfectly satisfactory
answer, so please continue to give me some advice to try out), this
command at least lets me run R in interactive mode and doesn't always
bail when I type in an incorrect statement:

`which mpirun` -n 1 -machinefile $PBS_NODEFILE R --interactive
(note the --interactive instead of the --vanilla)

With that said if I need to kill a process with control-c (usually
just returning me to an R prompt) I do get R bailing back to the bash
command line.  The other reason I'd like a within-R solution to this
is that I do my development within the Stat-et/Eclipse environment,
and (at least right now) there is no way to modify how it launches R
remotely.

--j

On Fri, Jun 1, 2012 at 3:12 PM, Stephen Weston
<stephen.b.weston at gmail.com> wrote:
> So you wanted 12 cpus on a single node, but the 12 spawned
> R processes were all scheduled by your OS on a single cpu
> rather than multiple cpus/cores on that node?
>
> If so, that suggests that somehow the cpu affinity has been set.
> We've seen this type of problem when using GotoBLAS2/OpenBLAS.
> Has your R installation been built with either of them?
>
> - Steve
>
>
> On Fri, Jun 1, 2012 at 11:52 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
>> R-sig-hpc'ers:
>>
>> Our system (running openmpi) allows for an interactive session to be
>> created with N number of CPUs allotted to it (12 in my case).  Here's
>> the qsub command to get the interactive node running:
>>
>> qsub -X -I -q [mygroup] -l nodes=1:ppn=12,walltime=48:00:00
>>
>> If I boot R and then try some HPC R commands e.g.:
>>
>> require(raster)
>> # Note this is just a wrapper for a snow call:
>> beginCluster()
>>
>> I get:
>>> beginCluster()
>> Loading required package: snow
>> 12 cores detected
>> cluster type: MPI
>> Loading required package: Rmpi
>>        12 slaves are spawned successfully. 0 failed.
>>
>> If I "top" I see that I have 12 (13?) R spawns running.  The problem
>> is, they are all running on a SINGLE cpu, not distributed amongst all
>> 12 cpus (even though it detected it).  My first question is: why is
>> this?  Is there a way to fix this from a standard "R" launch?
>>
>> Now, I can SOMEWHAT fix this by:
>> `which mpirun` -n 1 -machinefile $PBS_NODEFILE R --vanilla
>>
>> When I run the same commands, they distribute properly to all 12 cpus
>> BUT ANY error I make in typing will cause the entire system to "die":
>>> require(raster)
>> require(raster)
>> Loading required package: raster
>> Loading required package: sp
>> raster 1.9-92 (1-May-2012)
>>> beginCluster()
>> beginCluster()
>> Loading required package: snow
>> 12 cores detected
>> cluster type: MPI
>> Loading required package: Rmpi
>>        12 slaves are spawned successfully. 0 failed.
>>> abc
>> Error: object 'abc' not found
>> Execution halted
>> --------------------------------------------------------------------------
>> mpirun has exited due to process rank 0 with PID 28932 on
>> node [mynode] exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------
>>
>> Is there a way to allow me a "safer" mpirun launch that won't die if I
>> make a small typo?  This makes it REALLY hard to troubleshoot code if
>> any little error causes the quit.
>>
>> --j
>>
>>
>> --
>> Jonathan A. Greenberg, PhD
>> Assistant Professor
>> Department of Geography and Geographic Information Science
>> University of Illinois at Urbana-Champaign
>> 607 South Mathews Avenue, MC 150
>> Urbana, IL 61801
>> Phone: 415-763-5476
>> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
>> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc

-- 
Jonathan A. Greenberg, PhD
Assistant Professor
Department of Geography and Geographic Information Science
University of Illinois at Urbana-Champaign
607 South Mathews Avenue, MC 150
Urbana, IL 61801
Phone: 415-763-5476
AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
http://www.geog.illinois.edu/people/JonathanGreenberg.html