[R-sig-hpc] multi-threaded R/MPI jobs using SGE

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Wed Sep 22 15:18:59 CEST 2010


Your question is more advanced than my current experience.  But I can
tell you what I do with SGE when I use multicore (where all cores
should be on the same node).

Our sysadmin has setup a parallel environment called "local" which
makes sure that all the cores I request will be on the same node.  I
use it like
  qsub -pe local 6-12
The 6-12 means give me between 6 and 12 cores (SGE will always give me
the most I request, but that probably depend on the cluster setup).
The actual number of cores I get, gets stored in the environment
variable NSLOTS (or is it N_SLOTS, I don't recall), so in R I do

CORES = as.integer(Sys.getenv("NSLOTS"))
mclapply(LIST, FUN, mc.cores = CORES)

You clearly have a more advanced use case, but I would guess that
someone has done it (perhaps not using R).  I would furthermore guess
that the way to do it is the same as above: your allocated resources
gets stored in some environment variable that you then read from R and
feed into the doMPI setup.

Kasper


On Wed, Sep 22, 2010 at 7:08 AM, Renaud Gaujoux
<renaud at mancala.cbio.uct.ac.za> wrote:
> Hi,
>
> I want to run an MPI-multithread job on our local cluster (Rocks + SGE).
> My R script uses the doMPI/doMC packages to compute, say 10 tasks.
> I'd like to compute each task using as many CPUs available on a worker-host
> (meaning all available and assigned slots by SGE).
> Suppose I know each host as 4 slots.
>
> 1. SGE question: ideally I'd like to be able to ask for say 9 slots,
> allocated on 3 hosts (4 + 4 + 1), using the isolated slot to run the master
> thread.
> This way I can spawn one master, two 4-core workers to perform the tasks.
> I read one can configure an SGE parallel environment to pass to qsub -pe
> <pe_name> to ensure the allocation follows this rule.
> Has anybody this kind of environment available on its cluster?
> Would this one work?
>
> pe_name           mtmpich
> slots             9999
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
> stop_proc_args    /opt/gridengine/mpi/stopmpi.sh
> allocation_rule   $pe_slots
> control_slaves    TRUE
> job_is_first_task FALSE
> urgency_slots     min
>
>
> 2. Rmpi/doMPI question: suppose I cannot be sure the slots are allocated in
> such a way, say I get (2+2+3+2). This means that other users are using the
> other CPUs, which I do not want to over-use. Currently when I registerDoMC()
> it registers all 4 CPUs which interferes with the other users' jobs.
> Is it possible from within R to figure out which worker-host is running the
> code and the number of CPU I am allowed to use on it?
>
> 3. If anybody has successfully done this kind of thing, please let me know
> how.
>
> Thank you.
> Renaud
>
> --
> Renaud Gaujoux
> Computational Biology - University of Cape Town
> South Africa
>
>
>
>
> ###
> UNIVERSITY OF CAPE TOWN
> This e-mail is subject to the UCT ICT policies and e-mail disclaimer
> published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27
> 21 650 4500. This e-mail is intended only for the person(s) to whom it is
> addressed. If the e-mail has reached you in error, please notify the author.
> If you are not the intended recipient of the e-mail you may not use,
> disclose, copy, redirect or print the content. If this e-mail is not related
> to the business of UCT it is sent by the sender in the sender's individual
> capacity.
>
> ###
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list