[R-sig-hpc] problems with Rmpi under PBS

Hao Yu hyu at stats.uwo.ca
Fri Jul 16 16:09:26 CEST 2010


Can you check if Rmpi runs alone on the remote nodes? If remote nodes have
different OS/hardware, Rmpi must be compiled separately.

Notice that
           mpi.bcast.cmd(dummy1)
does not execute the function fummy1. It should be
           mpi.bcast.cmd(dummy1())

Hao

Vikneswaran Gopal wrote:
> Hi everyone,
>
> I am having problems running Rmpi on several nodes under PBS. When I
> run it on several processors on the same node, things are fine. When I
> start requesting for processors on more than 3 nodes, it seg faults.
> The code is very basic - it does nothing but broadcast a function and
> then execute it.
> ################################
> mainMainFn <- function(nsim) {
>    dummy1 <- function() {
>      n <- 200
>      k <- 2.3
>    }
>    outMx <- NULL
>    library(rlecuyer')
>    mpi.bcast.Robj2slave(dummy1)
>    mpi.setup.rngstream()
>
>    for (i in 1:nsim) {
>      mpi.bcast.cmd(dummy1)
>      cat("nsim", i, "done\n", sep=" ")
>    }
>    outMx
> }
>
> tmp <- mainMainFn(nsim=20)
> mpi.close.Rslaves()
> mpi.quit()
> ###############################
>
> The R output file has the following errors:
> [r10a-s5:05955] *** Process received signal ***
> [r10a-s5:05955] Signal: Segmentation fault (11)
> [r10a-s5:05955] Signal code: Address not mapped (1)
> [r10a-s5:05955] Failing at address: 0xe8d7978
> [r10a-s5:05955] [ 0] /lib64/libc.so.6 [0x3a16a30280]
> [r10a-s5:05955] [ 1] /lib64/libc.so.6 [0x3a16a7188c]
> [r10a-s5:05955] [ 2] /lib64/libc.so.6(cfree+0x8c) [0x3a16a7590c]
> [r10a-s5:05955] [ 3] /lib64/libnss_ldap.so.2 [0x2b02fb0a9f8e]
> [r10a-s5:05955] [ 4] /lib64/libnss_ldap.so.2 [0x2b02fb0a7353]
> [r10a-s5:05955] [ 5] /lib64/libnss_ldap.so.2 [0x2b02fb09f616]
> [r10a-s5:05955] [ 6] /lib64/libnss_ldap.so.2 [0x2b02fb08e6ef]
> [r10a-s5:05955] [ 7] /lib64/libnss_ldap.so.2 [0x2b02fb09177a]
> [r10a-s5:05955] [ 8] /lib64/libc.so.6(__libc_fork+0x1b2) [0x3a16a99952]
> [r10a-s5:05955] [ 9] /lib64/libc.so.6(_IO_proc_open+0xad) [0x3a16a62cbd]
> [r10a-s5:05955] [10] /lib64/libc.so.6(popen+0x69) [0x3a16a62f19]
> [r10a-s5:05955] [11] /usr/lib64/R/lib/libR.so [0x303458b6c9]
> [r10a-s5:05955] [12] /usr/lib64/R/lib/libR.so [0x30344fe09e]
> [r10a-s5:05955] [13] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
> [0x30344c2216]
> [r10a-s5:05955] [14] /usr/lib64/R/lib/libR.so [0x30344c6983]
> [r10a-s5:05955] [15] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
> [0x30344c2216]
> [r10a-s5:05955] [16] /usr/lib64/R/lib/libR.so(Rf_applyClosure+0x2a5)
> [0x30344c4935]
> [r10a-s5:05955] [17] /usr/lib64/R/lib/libR.so(Rf_eval+0x328)
> [0x30344c20e8]
> [r10a-s5:05955] [18] /usr/lib64/R/lib/libR.so [0x30344c279d]
> [r10a-s5:05955] [19] /usr/lib64/R/lib/libR.so(Rf_eval+0x4fd)
> [0x30344c22bd]
> [r10a-s5:05955] [20] /usr/lib64/R/lib/libR.so [0x30344c279d]
> [r10a-s5:05955] [21] /usr/lib64/R/lib/libR.so(Rf_eval+0x4fd)
> [0x30344c22bd]
> [r10a-s5:05955] [22] /usr/lib64/R/lib/libR.so [0x30344c7438]
> [r10a-s5:05955] [23] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
> [0x30344c2216]
> [r10a-s5:05955] [24] /usr/lib64/R/lib/libR.so [0x30344c6983]
> [r10a-s5:05955] [25] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
> [0x30344c2216]
> [r10a-s5:05955] [26] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
> [0x30344c2216]
> [r10a-s5:05955] [27] /usr/lib64/R/lib/libR.so [0x30344c6983]
> [r10a-s5:05955] [28] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
> [0x30344c2216]
> [r10a-s5:05955] [29] /usr/lib64/R/lib/libR.so(Rf_eval+0x456)
> [0x30344c2216]
> [r10a-s5:05955] *** End of error message ***
>
> Any ideas on what I might be doing wrong?
>
> Vik
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>


-- 
Department of Statistics & Actuarial Sciences
Fax Phone#:(519)-661-3813
The University of Western Ontario
Office Phone#:(519)-661-3622
London, Ontario N6A 5B7
http://www.stats.uwo.ca/faculty/yu



More information about the R-sig-hpc mailing list