[R-sig-hpc] Rmpi loads 2 versions of the same library

Ross Boylan ross at biostat.ucsf.edu
Fri Mar 14 18:06:47 CET 2014


On Fri, 2014-03-14 at 09:12 -0500, Dirk Eddelbuettel wrote:
> Hi Ei-ji,
> 
> On 14 March 2014 at 11:51, Ei-ji Nakama wrote:
> | hi,
> | 
> | 2014-03-14 4:16 GMT+09:00 Ross Boylan <ross at biostat.ucsf.edu>:
> | > I also don't know why libmpi.so.0 is preferred to libmpi.so.1 in the
> | > explicit load above.
> | 
> | I also had the sense of incompatibility of this problem.
> | Did anyone try in other platforms?(exclude mac and linux)
> | The following corrections might be preferable for the time being.
> 
> My memory is really foggy as this issue affected me many years ago when I was
> both using Open MPI much more and also looking after the Debian packages for
> it.  There was an issue with symbols not being found across the various
> shared libraries uses by Open MPI.  And now I can't recall if RTLD_GLOBAL
> helped, hurt or was required.  These days you know so much more about this
> than I do.
FWIW I seemed to get the same results with and without the dlopen
statement, provided I changed it to try the right version of libmpi
first.  However, this was using MPI configured with --disable-dlopen
--enable-static.  The Rmpi changelog noted that --disable-dlopen was no
longer necessary, at least on Debian, but it might still matter if the
programmatic dlopen is removed.

At any rate, I'm running with the dlopen enabled now.
> 
> On a Debian-based system, I would recommend getting in touch with the current
> maintainers. Sylvestre and Manuel know a lot of this.  And of course with
> Hao, the Rmpi author. He just responded almost immediately to a recent
> (Debian) bug report regarding which file to try dlopen on, and that (I think)
> was the same code segment as below.

I've been in touch with Hao.
> 
> Lastly, if were in Ross's shoes, I'd simply create a newer, local package
> which will avoid the issue of a conflict between which lib to pick...
Do you mean updating the system's mpi?  That would be a very radical
step.

It seems to be working on one machine now, though MPI won't launch on
any other nodes now.  I have to investigate whether the issue is that
the remote nodes are using the system MPI, or if it is that they lack
some of the libraries on the node I've been using to build (on which we
installed a bunch of packaged to support the build).

Ross
> 
> Dirk
> 
>  
> | --- Rmpi.orig/src/Rmpi.c    2013-03-27 02:21:49.000000000 +0900
> | +++ Rmpi/src/Rmpi.c    2014-03-14 11:36:43.000000000 +0900
> | @@ -18,6 +18,7 @@
> |  #include "Rmpi.h"
> | 
> |  #ifdef OPENMPI
> | +#define __USE_GNU
> |  #include <dlfcn.h>
> |  #endif
> | 
> | @@ -69,13 +70,19 @@
> | 
> |  #ifndef MAC
> |  #ifdef OPENMPI
> | -    if (!dlopen("libmpi.so.0", RTLD_GLOBAL | RTLD_LAZY)
> | -    && !dlopen("libmpi.so", RTLD_GLOBAL | RTLD_LAZY)){
> | -    //&& !dlopen("libmpi.dylib", RTLD_GLOBAL | RTLD_LAZY)
> | -     //&& !dlopen("libmpi.1.dylib", RTLD_GLOBAL | RTLD_LAZY)) {
> | -       Rprintf("%s\n",dlerror());
> | -        return AsInt(0);
> | -    }
> | +      { /* ifndef from MAC to __linux__ ? if only problem on linux */
> | +        Dl_info info_MPI_Init;
> | +        int rc = dladdr((void *)MPI_Init, &info_MPI_Init);
> | +        if(rc){
> | +          if (!dlopen(info_MPI_Init.dli_fname, RTLD_GLOBAL | RTLD_LAZY)){
> | +        Rprintf("%s\n",dlerror());
> | +        return AsInt(0);
> | +          }
> | +        }else{
> | +          Rprintf("%s\n",dlerror());
> | +          return AsInt(0);
> | +        }
> | +    }
> |  #endif
> |  #endif
> | 
> | 
> | >
> | > Using LD_DEBUG shows
> | > 24312:     file=libmpi.so.1 [0];  needed by /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]
> | > 24312:     find library=libmpi.so.1 [0]; searching
> | > 24312:      search path=/usr/lib64/R/lib:/home/ross/install/lib            (LD_LIBRARY_PATH)
> | > 24312:       trying file=/usr/lib64/R/lib/libmpi.so.1
> | > 24312:       trying file=/home/ross/install/lib/libmpi.so.1
> | >
> | > and, later,
> | >      24312:     file=libmpi.so.0 [0];  needed by /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]
> | >      24312:     find library=libmpi.so.0 [0]; searching
> | >      24312:      search path=/usr/lib64/R/lib:/home/ross/install/lib            (LD_LIBRARY_PATH)
> | >      24312:       trying file=/usr/lib64/R/lib/libmpi.so.0
> | >      24312:       trying file=/home/ross/install/lib/libmpi.so.0
> | >      24312:      search cache=/etc/ld.so.cache
> | >      24312:       trying file=/usr/lib/libmpi.so.0
> | >
> | > Does anyone know what's going on?
> | >
> | > Ross Boylan
> | >
> | > P.S. This might be relevant:
> | >      24300:     calling init: /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so
> | >      24300:
> | >      24300:     opening file=/home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so [0]; direct_opencount=1
> | >      24300:
> | >      24300:     /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so: error: symbol lookup error: undefined symbol: R_init_Rmpi (fatal)
> | >      24300:     /home/ross/Rlib-3.0.1/Rmpi/libs/Rmpi.so: error: symbol lookup error: undefined symbol: R_init_Rmpi (fatal)
> | >
> | > _______________________________________________
> | > R-sig-hpc mailing list
> | > R-sig-hpc at r-project.org
> | > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> | 
> | 
> | 
> | -- 
> | Best Regards,
> | --
> | EI-JI Nakama  <nakama (a) ki.rim.or.jp>
> | "\u4e2d\u9593\u6804\u6cbb"  <nakama (a) ki.rim.or.jp>
> | 
> | _______________________________________________
> | R-sig-hpc mailing list
> | R-sig-hpc at r-project.org
> | https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list