[R-sig-hpc] Rmpi loads 2 versions of the same library [SOLVED, BUT..]

Ross Boylan ross at biostat.ucsf.edu
Thu Mar 13 20:57:25 CET 2014


I'm happy to report that Rmpi now loads only my personal MPI libraries.
I believe the critical change was to the dlopen code in Rmpi.c to be
          //Ross Boylan changes order to search for mpi.so before mpi.so.1                                                            
          // 2014-03-13                                                                                                               
    if (!dlopen("libmpi.so", RTLD_GLOBAL | RTLD_LAZY)
        && !dlopen("libmpi.so.0", RTLD_GLOBAL | RTLD_LAZY)){
but I changed a lot of other things too: rebuilt MPI with special
options, rebuilt local copy of R set for local MPI; rebuilt Rmpi against
both.  I followed the advice from Bennet Fauber here:
http://www.open-mpi.org/community/lists/users/2014/03/23823.php
(though I didn't do precisely what he said).

I'm not so happy to report that the original problem that motivated the
whole exercise remains; in fact it's gotten slightly worse.
mpi.isend.Robj does not seem to be working properly.  I am sending to a
fake receiver (at rank 1) that does nothing but print a message when it
gets a message.  r is a list with
> length(serialize(r, NULL))
length(serialize(r, NULL))
[1] 599499
> mpi.send.Robj(1, 1, 4)
Fake Assembler: 0 4 numeric
> mpi.send.Robj(r, 1, 4)  # send of r works
NULL
> Fake Assembler: 0 4 list
mpi.isend.Robj(1, 1, 4)  # isend of number works
> Fake Assembler: 0 4 numeric
mpi.isend.Robj(r, 1, 4)  # sometimes this used to work the first time
mpi.isend.Robj(r, 1, 4)
> mpi.send.Robj(r, 1, 4) # sometimes used to get previous message unstuck
# never get the command prompt back

Ross



On Thu, 2014-03-13 at 12:16 -0700, Ross Boylan wrote:
> I've been trying to get Rmpi to work with my personal copy of MPI, which
> is newer than the system's.  Even when I set LD_LIBRARY_PATH
> appropriately, and build Rmpi with
> 
>   export LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH
>   export PATH=/home/ross/install/bin:$PATH
>   # Not sure what I should use for --with-mpi
>   R CMD INSTALL Rmpi --configure-args='--with-Rmpi-include=/home/ross/install/include --with-Rmpi-libpath=/home/ross/install/lib --with\
>   -mpi=/home/ross/install --with-Rmpi-type=OPENMPI'
> 
> I find that the R process opens both the system and personal copies of
> mpi-related libs (according to lsof and /proc/nnn/map).  ldd on my
> Rmpi.so shows only references to my local copies.  I think the paths
> show by ldd are simply advisory.
> 
> I think the cause is this code in Rmpi.c:
>     if (!dlopen("libmpi.so.0", RTLD_GLOBAL | RTLD_LAZY)
>         && !dlopen("libmpi.so", RTLD_GLOBAL | RTLD_LAZY)){
> http://www.stats.uwo.ca/faculty/yu/Rmpi/changelogs.htm notes
> ----------------------------------
> 2007-10-24, version 0.5-5:
> 
> dlopen has been used to load libmpi.so explicitly. This is mainly useful for Rmpi under OpenMPI where one might see many error messages:
> mca: base: component_find: unable to open osc pt2pt: file not found (ignored)
> if libmpi.so is not loaded with RTLD_GLOBAL flag.
> -------------------------------------
> 
> I'm not sure which version of mpi ends up getting used.
> 
> I also don't know why libmpi.so.0 is preferred to libmpi.so.1 in the
> explicit load above.
> 
> Using LD_DEBUG shows
> 24312:     file=libmpi.so.1 [0];  needed by /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]
> 24312:     find library=libmpi.so.1 [0]; searching
> 24312:      search path=/usr/lib64/R/lib:/home/ross/install/lib            (LD_LIBRARY_PATH)
> 24312:       trying file=/usr/lib64/R/lib/libmpi.so.1
> 24312:       trying file=/home/ross/install/lib/libmpi.so.1
> 
> and, later,
>      24312:     file=libmpi.so.0 [0];  needed by /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]
>      24312:     find library=libmpi.so.0 [0]; searching
>      24312:      search path=/usr/lib64/R/lib:/home/ross/install/lib            (LD_LIBRARY_PATH)
>      24312:       trying file=/usr/lib64/R/lib/libmpi.so.0
>      24312:       trying file=/home/ross/install/lib/libmpi.so.0
>      24312:      search cache=/etc/ld.so.cache
>      24312:       trying file=/usr/lib/libmpi.so.0
> 
> Does anyone know what's going on?
> 
> Ross Boylan
> 
> P.S. This might be relevant:
>      24300:     calling init: /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so
>      24300:
>      24300:     opening file=/home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]; direct_opencount=1
>      24300:
>      24300:     /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so: error: symbol lookup error: undefined symbol: R_init_Rmpi (fatal)
>      24300:     /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so: error: symbol lookup error: undefined symbol: R_init_Rmpi (fatal)
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list