[R-sig-hpc] mpirun and R

Stephen Weston stephen.b.weston at gmail.com
Fri Jun 1 22:12:29 CEST 2012


So you wanted 12 cpus on a single node, but the 12 spawned
R processes were all scheduled by your OS on a single cpu
rather than multiple cpus/cores on that node?

If so, that suggests that somehow the cpu affinity has been set.
We've seen this type of problem when using GotoBLAS2/OpenBLAS.
Has your R installation been built with either of them?

- Steve


On Fri, Jun 1, 2012 at 11:52 AM, Jonathan Greenberg <jgrn at illinois.edu> wrote:
> R-sig-hpc'ers:
>
> Our system (running openmpi) allows for an interactive session to be
> created with N number of CPUs allotted to it (12 in my case).  Here's
> the qsub command to get the interactive node running:
>
> qsub -X -I -q [mygroup] -l nodes=1:ppn=12,walltime=48:00:00
>
> If I boot R and then try some HPC R commands e.g.:
>
> require(raster)
> # Note this is just a wrapper for a snow call:
> beginCluster()
>
> I get:
>> beginCluster()
> Loading required package: snow
> 12 cores detected
> cluster type: MPI
> Loading required package: Rmpi
>        12 slaves are spawned successfully. 0 failed.
>
> If I "top" I see that I have 12 (13?) R spawns running.  The problem
> is, they are all running on a SINGLE cpu, not distributed amongst all
> 12 cpus (even though it detected it).  My first question is: why is
> this?  Is there a way to fix this from a standard "R" launch?
>
> Now, I can SOMEWHAT fix this by:
> `which mpirun` -n 1 -machinefile $PBS_NODEFILE R --vanilla
>
> When I run the same commands, they distribute properly to all 12 cpus
> BUT ANY error I make in typing will cause the entire system to "die":
>> require(raster)
> require(raster)
> Loading required package: raster
> Loading required package: sp
> raster 1.9-92 (1-May-2012)
>> beginCluster()
> beginCluster()
> Loading required package: snow
> 12 cores detected
> cluster type: MPI
> Loading required package: Rmpi
>        12 slaves are spawned successfully. 0 failed.
>> abc
> Error: object 'abc' not found
> Execution halted
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 28932 on
> node [mynode] exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> Is there a way to allow me a "safer" mpirun launch that won't die if I
> make a small typo?  This makes it REALLY hard to troubleshoot code if
> any little error causes the quit.
>
> --j
>
>
> --
> Jonathan A. Greenberg, PhD
> Assistant Professor
> Department of Geography and Geographic Information Science
> University of Illinois at Urbana-Champaign
> 607 South Mathews Avenue, MC 150
> Urbana, IL 61801
> Phone: 415-763-5476
> AIM: jgrn307, MSN: jgrn307 at hotmail.com, Gchat: jgrn307, Skype: jgrn3007
> http://www.geog.illinois.edu/people/JonathanGreenberg.html
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list