[R] Memory distribution using foreach
Simon Zehnder
szehnder at uni-bonn.de
Sat Sep 28 00:05:27 CEST 2013
First of all LSF is a batch scheduling software. It usually expects an .lsf script. Usually the compilers on a cluster are interchangeable via the 'module switch <unload module> <load module>' and MPI-2 is the message passing interface standard. This is also rather an topic for the high-performance R list.
Next, doMC is a multicore package registering cores on one machine - AFAIK, i.e. you have to work on one machine with the 24 cores (inform yourself on the hardware on your cluster - there should also be introduction presentations online! To know what hardware you use and what architecture it has is the first step! Try 'bhosts' on your shell to see what hosts are available). If you want to use several machines, your backend for foreach should be doMPI and not doMC (see http://cran.r-project.org/web/packages/doMPI/vignettes/doMPI.pdf).
If you found your host, you have to write an lsf-script like the one following (for OpenMP on ONE machine - using 24 cores, in most cases this suffices. Further, it is faster as you do not have to wait that long because you have to use just ONE machine. If you have BULL clusters - take these. A lot of cores 32/64… and a lot of memory)
So in your case, write a script with extension .lsf containing:
### using the zsh shell
#!/usr/bin/env zsh
### Job name
#BSUB -J OpenMP
### File/path where output will be written, the %J is the job ID
#BSUB -o OpenMP.%J
### (OFF) Different file for STDERR, if not to be merged with STDOUT
# #BSUB -e OpenM.e%J
### Request the time you need for execution in minutes
### The format for the parameter is: [hour:]minute,
### that means for 80 minutes you could also use this: 1:20
#BSUB -W 3:00
### Request virtual memory you need for your job in MB (per process)
#BSUB -M 1024
### Request higher amount of stack site (per process)
#BSUB -S 1024
### Request the number of compute slots you want to use
#BSUB -n 24
### Specify your mail address
#BSUB pkount at bgc-jena.mpg.de
### Send a mail when job is done
#BSUB -N
### Use esub for OpenMP
#BSUB -a openmp
### (OFF) As R is usually compiled via gcc I would load the gcc module on your cluster
# module switch pgi gcc/4.6
### (OFF) load another OpenMP (check which one is usually loaded!! should be now OpenMP 4.0) version than the default one
# module switch openmp openmp/3.0
### Set stack and address limits
ulimit -s unlimited
ulimit -v unlimited
### Change to the work directory
cd /home/your_username/
### Execute your application (make sure, that R can be loaded via 'R' on the shell!!!)
R --no-restore --no-save --quiet --slave < your_R_script.R
------------------------
In your R script file, load the packages
library(doMC)
library(foreach)
registerDoMC(24) ## now, foreach knows the backend.
forach(...) %dopar% …..
## save your stuff to your work- or home directory (csv or database)
quit()
-----------------------
Then you send the script to LSF via
bsub <- my_LSF_script.lsf
Look via 'bjobs' if it is is send and what's its status (PEND or RUN). If the status is RUN you can look via 'bpeek your_job_ID' what the output looks like, while it runs.
Best
Simon
On Sep 27, 2013, at 10:48 PM, pakoun <pkount at bgc-jena.mpg.de> wrote:
> Dear R users,
> I am struggling with memory issues and try to understand a few things. I am
> using an LSF cluster with PGI compiler and parallel mpi2 computing (whatever
> does that means..) and i submit a job like:
>
> bsub -R "rusage[mem=30000]" -q queue -n 24 R CMD BATCH <arguments..>
> myjob.r ..log
>
> According to that I am asking for 24 cores and 30GB RAM.
>
> Then I am using
> library(doMC)
> registerDoMC(24)
>
> and a foreach loop either simple or nested with the %dopar% command.
>
> 1. this 30 GB will be distributed among the 24 jobs or each will take 30?
> 2. If i dont ask the -n 24 argument still the foreach loop will run in
> parallel as i check with TOP command. What is the purpose of using it? Just
> to "reserve" the nodes from other users?
>
> Thank you
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Memory-distribution-using-foreach-tp4677133.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list