[R] Hope u have some time for 2 more questions

Simon Zehnder szehnder at uni-bonn.de
Fri Sep 20 12:08:29 CEST 2013


Hi,

as far as I know, there is no limitation on data size in regard to foreach. You should reserve though enough memory for your application on the cluster (via ulimit -s unlimited and ulimit -v unlimited). 

Furthermore I would check the following: 
Check if there are two versions of R on the cluster/your home directory on the frontend (LSF loads this frontend environment and uses the R version installed there). If you have two R executables (R and R64) make sure you use the 64bit version.

Run R and call memory.limit() to see what are the limits of memory in your system. 

If this is limited to sizes below your needed sizes, increase it by calling R in the LSF script with the options --max-mem-size=YourSize and if you get errors of kind " cannot allocate vector of size" you should also use --max-vsize=YourVSize. 

Then, check if there is a memory leak in your application: If you compiled R with the --enable-memory-profiling you can use Rprof to do this otherwise you must rely on profiling instruments given by the cluster environment (I think you work there as well with modules, so type in the shell 'module avail' for listing available modules). 

If you detect a memory leak or if you see, that at certain points in your algorithms some objects are not used anymore call rm(ObjectName) and gc() for garbage collection. 


To your nested loop using foreach: That is a highly delicate issue in parallel computing and for the foreach syntax I refer to the must-read http://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf. 

Using nested loops should be considered carefully in regard to organizing the nesting. In C++ you have the ability to determine how many cores should work on which loop. In the foreach environment using doMC this seems to me not possible. 


And, please keep the discussion to the r-help mailing list, so others can learn from it and researchers with more experience can also leave comments. 


Best

Simon


On Sep 19, 2013, at 9:24 PM, pkount at bgc-jena.mpg.de wrote:

> Hi again,
> 
> if you have some time I would like to bother you again with 2 more questions. After your response the parallel code is working perfect but when I implement that to the real case (big matrices) I get an error for not numeric dimension and i guess that again it returns NULL or something. Are you aware if foreach loop can handle only a certain size objects? the equation that I am using includes 3 objects with 2Gb size each. 
> 
> The second question has to deal with the cores that foreach uses. Although I am asking to our cluster (LSF) to give me certain number of cpus, and also i am specifing that with
> library(doMC)
> registerDoMC(n) 
> 
> it seems from the top command that I am using all the cores. I am using 2 foreach as nest  foreach(i in 1:16){
>         foreach(j in 1:10)  etc etc..
> maybe i should do something with this kind of nest? I am not aware about that.
> 
> I am sorry for the long text , and thank you for your nice solution
> 
> _____________________________________
> Sent from http://r.789695.n4.nabble.com
> 



More information about the R-help mailing list