[R-sig-hpc] Rmpi loads 2 versions of the same library [SOLVED, BUT..]
Ross Boylan
ross at biostat.ucsf.edu
Thu Mar 13 20:57:25 CET 2014
I'm happy to report that Rmpi now loads only my personal MPI libraries.
I believe the critical change was to the dlopen code in Rmpi.c to be
//Ross Boylan changes order to search for mpi.so before mpi.so.1
// 2014-03-13
if (!dlopen("libmpi.so", RTLD_GLOBAL | RTLD_LAZY)
&& !dlopen("libmpi.so.0", RTLD_GLOBAL | RTLD_LAZY)){
but I changed a lot of other things too: rebuilt MPI with special
options, rebuilt local copy of R set for local MPI; rebuilt Rmpi against
both. I followed the advice from Bennet Fauber here:
http://www.open-mpi.org/community/lists/users/2014/03/23823.php
(though I didn't do precisely what he said).
I'm not so happy to report that the original problem that motivated the
whole exercise remains; in fact it's gotten slightly worse.
mpi.isend.Robj does not seem to be working properly. I am sending to a
fake receiver (at rank 1) that does nothing but print a message when it
gets a message. r is a list with
> length(serialize(r, NULL))
length(serialize(r, NULL))
[1] 599499
> mpi.send.Robj(1, 1, 4)
Fake Assembler: 0 4 numeric
> mpi.send.Robj(r, 1, 4) # send of r works
NULL
> Fake Assembler: 0 4 list
mpi.isend.Robj(1, 1, 4) # isend of number works
> Fake Assembler: 0 4 numeric
mpi.isend.Robj(r, 1, 4) # sometimes this used to work the first time
mpi.isend.Robj(r, 1, 4)
> mpi.send.Robj(r, 1, 4) # sometimes used to get previous message unstuck
# never get the command prompt back
Ross
On Thu, 2014-03-13 at 12:16 -0700, Ross Boylan wrote:
> I've been trying to get Rmpi to work with my personal copy of MPI, which
> is newer than the system's. Even when I set LD_LIBRARY_PATH
> appropriately, and build Rmpi with
>
> export LD_LIBRARY_PATH=/home/ross/install/lib:$LD_LIBRARY_PATH
> export PATH=/home/ross/install/bin:$PATH
> # Not sure what I should use for --with-mpi
> R CMD INSTALL Rmpi --configure-args='--with-Rmpi-include=/home/ross/install/include --with-Rmpi-libpath=/home/ross/install/lib --with\
> -mpi=/home/ross/install --with-Rmpi-type=OPENMPI'
>
> I find that the R process opens both the system and personal copies of
> mpi-related libs (according to lsof and /proc/nnn/map). ldd on my
> Rmpi.so shows only references to my local copies. I think the paths
> show by ldd are simply advisory.
>
> I think the cause is this code in Rmpi.c:
> if (!dlopen("libmpi.so.0", RTLD_GLOBAL | RTLD_LAZY)
> && !dlopen("libmpi.so", RTLD_GLOBAL | RTLD_LAZY)){
> http://www.stats.uwo.ca/faculty/yu/Rmpi/changelogs.htm notes
> ----------------------------------
> 2007-10-24, version 0.5-5:
>
> dlopen has been used to load libmpi.so explicitly. This is mainly useful for Rmpi under OpenMPI where one might see many error messages:
> mca: base: component_find: unable to open osc pt2pt: file not found (ignored)
> if libmpi.so is not loaded with RTLD_GLOBAL flag.
> -------------------------------------
>
> I'm not sure which version of mpi ends up getting used.
>
> I also don't know why libmpi.so.0 is preferred to libmpi.so.1 in the
> explicit load above.
>
> Using LD_DEBUG shows
> 24312: file=libmpi.so.1 [0]; needed by /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]
> 24312: find library=libmpi.so.1 [0]; searching
> 24312: search path=/usr/lib64/R/lib:/home/ross/install/lib (LD_LIBRARY_PATH)
> 24312: trying file=/usr/lib64/R/lib/libmpi.so.1
> 24312: trying file=/home/ross/install/lib/libmpi.so.1
>
> and, later,
> 24312: file=libmpi.so.0 [0]; needed by /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]
> 24312: find library=libmpi.so.0 [0]; searching
> 24312: search path=/usr/lib64/R/lib:/home/ross/install/lib (LD_LIBRARY_PATH)
> 24312: trying file=/usr/lib64/R/lib/libmpi.so.0
> 24312: trying file=/home/ross/install/lib/libmpi.so.0
> 24312: search cache=/etc/ld.so.cache
> 24312: trying file=/usr/lib/libmpi.so.0
>
> Does anyone know what's going on?
>
> Ross Boylan
>
> P.S. This might be relevant:
> 24300: calling init: /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so
> 24300:
> 24300: opening file=/home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]; direct_opencount=1
> 24300:
> 24300: /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so: error: symbol lookup error: undefined symbol: R_init_Rmpi (fatal)
> 24300: /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so: error: symbol lookup error: undefined symbol: R_init_Rmpi (fatal)
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
More information about the R-sig-hpc
mailing list