[R-sig-hpc] problems with Rmpi under PBS

Vikneswaran Gopal viknesh at stat.ufl.edu
Thu Jul 15 08:42:46 CEST 2010


Hi everyone,

I am having problems running Rmpi on several nodes under PBS. When I  
run it on several processors on the same node, things are fine. When I  
start requesting for processors on more than 3 nodes, it seg faults.  
The code is very basic - it does nothing but broadcast a function and  
then execute it.
################################
mainMainFn <- function(nsim) {
   dummy1 <- function() {
     n <- 200
     k <- 2.3
   }
   outMx <- NULL
   library(rlecuyer')
   mpi.bcast.Robj2slave(dummy1)
   mpi.setup.rngstream()

   for (i in 1:nsim) {
     mpi.bcast.cmd(dummy1)
     cat("nsim", i, "done\n", sep=" ")
   }
   outMx
}

tmp <- mainMainFn(nsim=20)
mpi.close.Rslaves()
mpi.quit()
###############################

The R output file has the following errors:
[r10a-s5:05955] *** Process received signal ***
[r10a-s5:05955] Signal: Segmentation fault (11)
[r10a-s5:05955] Signal code: Address not mapped (1)
[r10a-s5:05955] Failing at address: 0xe8d7978
[r10a-s5:05955] [ 0] /lib64/libc.so.6 [0x3a16a30280]
[r10a-s5:05955] [ 1] /lib64/libc.so.6 [0x3a16a7188c]
[r10a-s5:05955] [ 2] /lib64/libc.so.6(cfree+0x8c) [0x3a16a7590c]
[r10a-s5:05955] [ 3] /lib64/libnss_ldap.so.2 [0x2b02fb0a9f8e]
[r10a-s5:05955] [ 4] /lib64/libnss_ldap.so.2 [0x2b02fb0a7353]
[r10a-s5:05955] [ 5] /lib64/libnss_ldap.so.2 [0x2b02fb09f616]
[r10a-s5:05955] [ 6] /lib64/libnss_ldap.so.2 [0x2b02fb08e6ef]
[r10a-s5:05955] [ 7] /lib64/libnss_ldap.so.2 [0x2b02fb09177a]
[r10a-s5:05955] [ 8] /lib64/libc.so.6(__libc_fork+0x1b2) [0x3a16a99952]
[r10a-s5:05955] [ 9] /lib64/libc.so.6(_IO_proc_open+0xad) [0x3a16a62cbd]
[r10a-s5:05955] [10] /lib64/libc.so.6(popen+0x69) [0x3a16a62f19]
[r10a-s5:05955] [11] /usr/lib64/R/lib/libR.so [0x303458b6c9]
[r10a-s5:05955] [12] /usr/lib64/R/lib/libR.so [0x30344fe09e]
[r10a-s5:05955] [13] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)  
[0x30344c2216]
[r10a-s5:05955] [14] /usr/lib64/R/lib/libR.so [0x30344c6983]
[r10a-s5:05955] [15] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)  
[0x30344c2216]
[r10a-s5:05955] [16] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x2a5)  
[0x30344c4935]
[r10a-s5:05955] [17] /usr/lib64/R/lib/libR.so(Rf_eval+0x328)  
[0x30344c20e8]
[r10a-s5:05955] [18] /usr/lib64/R/lib/libR.so [0x30344c279d]
[r10a-s5:05955] [19] /usr/lib64/R/lib/libR.so(Rf_eval+0x4fd)  
[0x30344c22bd]
[r10a-s5:05955] [20] /usr/lib64/R/lib/libR.so [0x30344c279d]
[r10a-s5:05955] [21] /usr/lib64/R/lib/libR.so(Rf_eval+0x4fd)  
[0x30344c22bd]
[r10a-s5:05955] [22] /usr/lib64/R/lib/libR.so [0x30344c7438]
[r10a-s5:05955] [23] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)  
[0x30344c2216]
[r10a-s5:05955] [24] /usr/lib64/R/lib/libR.so [0x30344c6983]
[r10a-s5:05955] [25] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)  
[0x30344c2216]
[r10a-s5:05955] [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)  
[0x30344c2216]
[r10a-s5:05955] [27] /usr/lib64/R/lib/libR.so [0x30344c6983]
[r10a-s5:05955] [28] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)  
[0x30344c2216]
[r10a-s5:05955] [29] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)  
[0x30344c2216]
[r10a-s5:05955] *** End of error message ***

Any ideas on what I might be doing wrong?

Vik



More information about the R-sig-hpc mailing list