[Rd] after some time R stopped returning from Rmpi calls
Sklyar, Oleg (London)
osklyar at maninvestments.com
Thu Jan 29 16:09:07 CET 2009
Hi,
this is not exactly a developer question, but maybe you have noticed
similar behaviour before. For quite some time R and Rmpi were working
perfectly for me until one day they just stopped doing so without any
changes in the configs. R still spawns jobs as requested, and if they
are small they run through and return, but as soon as their duration is
over 5s or so the spawned processes go to sleep and never return to the
head node. Below is the top of one of the slave nodes with the spawned
jobs, as you see their status is sleeping. It looks like a communication
problem between the master and the slave nodes, but this behaviour *is*
user specific: exactly the same script will work for some users and will
just lead to hanging for others.
Rmpi is installed with a default R CMD INSTALL without additional
arguments. LD_LIBRARY_PATH is set and the whole setup *was* working with
the same config.
Has anybody experienced similar problems with Rmpi and LAM before?
Thank you,
Oleg
RHEL 5 x86_64, 16core Opteron
LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
It is quite a dated version of R I running now, but recent Rmpi.
> sessionInfo()
R version 2.9.0 Under development (unstable) (2008-09-30 r46585)
x86_64-unknown-linux-gnu
locale:
C
attached base packages:
[1] stats graphics utils datasets grDevices methods base
other attached packages:
[1] Rmpi_0.5-5
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7699 osklyar 16 0 19128 1448 1000 S 0 0.0 0:00.02 lamd
7807 osklyar 16 0 8652 992 824 S 0 0.0 0:00.01 Rslaves.sh
7808 osklyar 16 0 8656 992 824 S 0 0.0 0:00.01 Rslaves.sh
7809 osklyar 16 0 8652 992 824 S 0 0.0 0:00.00 Rslaves.sh
7810 osklyar 17 0 8656 992 824 S 0 0.0 0:00.01 Rslaves.sh
7811 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh
7812 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh
7813 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh
7814 osklyar 18 0 8656 992 824 S 0 0.0 0:00.02 Rslaves.sh
7815 osklyar 15 0 165m 60m 4568 S 0 0.2 0:03.66 R
7816 osklyar 16 0 161m 56m 4568 S 0 0.2 0:03.51 R
7817 osklyar 15 0 161m 56m 4584 S 0 0.2 0:03.82 R
7818 osklyar 16 0 161m 56m 4568 S 0 0.2 0:03.31 R
7819 osklyar 16 0 165m 61m 4568 S 0 0.2 0:03.59 R
7820 osklyar 15 0 162m 58m 4568 S 0 0.2 0:03.43 R
7821 osklyar 16 0 162m 58m 4568 S 0 0.2 0:03.26 R
7824 osklyar 16 0 161m 56m 4568 S 0 0.2 0:03.49 R
7973 osklyar 15 0 87208 1880 1140 S 0 0.0 0:00.00 sshd
7974 osklyar 15 0 72332 1716 1276 S 0 0.0 0:00.01 bash
Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
osklyar at maninvestments.com
**********************************************************************
Please consider the environment before printing this email or its attachments.
The contents of this email are for the named addressees ...{{dropped:19}}
More information about the R-devel
mailing list