[R-sig-hpc] problems with Rmpi under PBS
Vikneswaran Gopal
viknesh at stat.ufl.edu
Thu Jul 15 08:42:46 CEST 2010
Hi everyone,
I am having problems running Rmpi on several nodes under PBS. When I
run it on several processors on the same node, things are fine. When I
start requesting for processors on more than 3 nodes, it seg faults.
The code is very basic - it does nothing but broadcast a function and
then execute it.
################################
mainMainFn <- function(nsim) {
dummy1 <- function() {
n <- 200
k <- 2.3
}
outMx <- NULL
library(rlecuyer')
mpi.bcast.Robj2slave(dummy1)
mpi.setup.rngstream()
for (i in 1:nsim) {
mpi.bcast.cmd(dummy1)
cat("nsim", i, "done\n", sep=" ")
}
outMx
}
tmp <- mainMainFn(nsim=20)
mpi.close.Rslaves()
mpi.quit()
###############################
The R output file has the following errors:
[r10a-s5:05955] *** Process received signal ***
[r10a-s5:05955] Signal: Segmentation fault (11)
[r10a-s5:05955] Signal code: Address not mapped (1)
[r10a-s5:05955] Failing at address: 0xe8d7978
[r10a-s5:05955] [ 0] /lib64/libc.so.6 [0x3a16a30280]
[r10a-s5:05955] [ 1] /lib64/libc.so.6 [0x3a16a7188c]
[r10a-s5:05955] [ 2] /lib64/libc.so.6(cfree+0x8c) [0x3a16a7590c]
[r10a-s5:05955] [ 3] /lib64/libnss_ldap.so.2 [0x2b02fb0a9f8e]
[r10a-s5:05955] [ 4] /lib64/libnss_ldap.so.2 [0x2b02fb0a7353]
[r10a-s5:05955] [ 5] /lib64/libnss_ldap.so.2 [0x2b02fb09f616]
[r10a-s5:05955] [ 6] /lib64/libnss_ldap.so.2 [0x2b02fb08e6ef]
[r10a-s5:05955] [ 7] /lib64/libnss_ldap.so.2 [0x2b02fb09177a]
[r10a-s5:05955] [ 8] /lib64/libc.so.6(__libc_fork+0x1b2) [0x3a16a99952]
[r10a-s5:05955] [ 9] /lib64/libc.so.6(_IO_proc_open+0xad) [0x3a16a62cbd]
[r10a-s5:05955] [10] /lib64/libc.so.6(popen+0x69) [0x3a16a62f19]
[r10a-s5:05955] [11] /usr/lib64/R/lib/libR.so [0x303458b6c9]
[r10a-s5:05955] [12] /usr/lib64/R/lib/libR.so [0x30344fe09e]
[r10a-s5:05955] [13] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
[0x30344c2216]
[r10a-s5:05955] [14] /usr/lib64/R/lib/libR.so [0x30344c6983]
[r10a-s5:05955] [15] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
[0x30344c2216]
[r10a-s5:05955] [16] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x2a5)
[0x30344c4935]
[r10a-s5:05955] [17] /usr/lib64/R/lib/libR.so(Rf_eval+0x328)
[0x30344c20e8]
[r10a-s5:05955] [18] /usr/lib64/R/lib/libR.so [0x30344c279d]
[r10a-s5:05955] [19] /usr/lib64/R/lib/libR.so(Rf_eval+0x4fd)
[0x30344c22bd]
[r10a-s5:05955] [20] /usr/lib64/R/lib/libR.so [0x30344c279d]
[r10a-s5:05955] [21] /usr/lib64/R/lib/libR.so(Rf_eval+0x4fd)
[0x30344c22bd]
[r10a-s5:05955] [22] /usr/lib64/R/lib/libR.so [0x30344c7438]
[r10a-s5:05955] [23] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
[0x30344c2216]
[r10a-s5:05955] [24] /usr/lib64/R/lib/libR.so [0x30344c6983]
[r10a-s5:05955] [25] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
[0x30344c2216]
[r10a-s5:05955] [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
[0x30344c2216]
[r10a-s5:05955] [27] /usr/lib64/R/lib/libR.so [0x30344c6983]
[r10a-s5:05955] [28] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
[0x30344c2216]
[r10a-s5:05955] [29] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
[0x30344c2216]
[r10a-s5:05955] *** End of error message ***
Any ideas on what I might be doing wrong?
Vik
More information about the R-sig-hpc
mailing list