[R-sig-hpc] Rmpi hanging on slave nodes
Sklyar, Oleg (London)
osklyar at maninvestments.com
Mon Feb 2 18:27:24 CET 2009
Dear Hao, dear list:
for some time now I have had strange issues with Rmpi which essentially
result in the head node spawning jobs to slave nodes, those run for a
couple of seconds and then hang and never return back to the head node
(see the 'top' output below). The issue occurs *only* when I run a
function like the one below (simplified) being a part of a package. If I
run the same from the global environment, or if I run the below domaccs
that broadcasts a function being defined in the global environment, then
everything runs through.
Now out of 10 custom packages that I have, the above problem occurs for
just about two if I import them into the calling package. All other do
not have any adverse effect. I was trying to strip off functionality of
the two mentioned packages to avoid name clashes etc, but was not able
to locate any particular function that would lead to clashes -- at some
point if the package is attached, Rmpi stops working. I understand that
this information is very vague, but I was thinking that somebody has had
a similar problem already and knows the answer. The same issue with Rmpi
0.5-5 through to 0.5-7
Will be grateful for any ideas.
Thanks,
Oleg
PS: I am sorry I cannot post the exact code as it is not open source,
but then it would be too large anyway.
function = sim(tsdata, ...) {
domaccs = function(tsdata, ...) {
mpi.spawn.Rslaves(nslaves=length(tsdata), needlog=FALSE)
res = mpi.parLapply(tsdata,
function(data, ...) {
require(Sim)
singleSim(data, ...)
}, ..., job.num=length(tsdata))
mpi.close.Rslaves()
res
}
domaccs(tsdata, ...)
}
It is RHEL5 64bit, LAM 7.1.4, 16-node cluster of Opterons
> sessionInfo()
R version 2.9.0 Under development (unstable) (2008-09-30 r46585)
x86_64-unknown-linux-gnu
locale:
C
attached base packages:
[1] splines stats graphics utils datasets grDevices methods
[8] base
other attached packages:
[1] Rmpi_0.5-7 Sim_0.2.41 Data_0.2.20 NagLib_0.1.7
[5] Finance_0.1.77 DBConn_0.2.24 ROracle_0.5-9 RODBC_1.2-3
[9] DBI_0.2-4 Calendar_0.2.88 Base_0.1.36
loaded via a namespace (and not attached):
[1] tools_2.9.0
Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
osklyar at maninvestments.com
**********************************************************************
Please consider the environment before printing this email or its attachments.
The contents of this email are for the named addressees ...{{dropped:19}}
More information about the R-sig-hpc
mailing list