[R-sig-hpc] Unreproducable crashes of R-instances on cluster running Torque

Till Francke win at comets.de
Thu May 2 11:14:16 CEST 2013


Dear List,
I am a user of a Linux cluster running Torque.
I want to run very "embarassingly parallel" R jobs (no worker interaction,  
no MPI/multicore, just simple replicates of a script with different  
arguments). Whenever I submit more than ~30 of these, I encounter  
problems: Some jobs run fine, others terminate with R-messages on memory  
allocation problems, or even finish without further output, sometimes  
crashing a node of the cluster. Any of these scripts run fine when started  
alone.
My admin suggests this is a memory leak in R, however, I wonder if even  
that would be the case, if this should stall the cluster.
Could anyone give me some advise how to address this, please?

Thanks,

Till


Scientific Linux SL release 5.5 (Boron)
Linux head 2.6.18-348.1.1.el5 #1 SMP Tue Jan 22 16:26:03 EST 2013 x86_64  
x86_64 x86_64 GNU/Linux
R version 2.12.1 (2010-12-16)






-- 
Erstellt mit Operas revolutionärem E-Mail-Modul: http://www.opera.com/mail/



More information about the R-sig-hpc mailing list