[R-sig-hpc] Rmpi on NetBSD with OpenMPI

Kevin.Buckley at ecs.vuw.ac.nz Kevin.Buckley at ecs.vuw.ac.nz
Mon May 17 05:02:46 CEST 2010


It would appear there is a fix for the issue I was seeing.

After some discussion around the underlying cause of the
problem on the OpenMPI devel list, Jeff Squyres wrote:

> Now, all this being said, IIRC (and I very well may not!), the real
> underlying issue here is that R is dlopening libmpi.so, which, in turn, is
> dlopening its own DSOs.  Given the global linker scoping issues, OMPI's
> DSOs are unable to find the symbols they need to resolve in the process
> (because libmpi.so's was opened in a private scope).
>
> This probably is unfortunately larger than us (Open MPI) -- it's really a
> POSIX issue.  What would be ideal is if different linker namespaces could
> be something more fine-grained than "global" or "private" within a
> process.  E.g., if the private namespace of libmpi.so in the process could
> selectively make its symbol namespace available to the DSOs that it
> dlopens.  Right now, the only option libmpi.so has is to be opened
> with a public scope, which somewhat defeats the point of private
> scoping.
>

Tying in with the suggestions Jeff makes above, there would seem to
be a work-around fix for this, in the case of the Rmpi package
on NetBSD anyway.

Furthermore, the fix does not require any alterations to OpenMPI.

Apparently, there has been a similar issue, symbol visibility
when chaining shared library loading, within PAM on NetBSD.

Mark Davies has now determined a way to force the Rmpi package
to load libmpi.so, ahead of loading the Rmpi shared library itself,
so that what appear to be the missing symbols are then available,
for any future loads of the OpenMPI component libraries.


On the version of Rmpi that I have been using, 0.5-8, the "fix"
can be effected by the following, one, line, patch

--- Rmpi/R/zzz.R        2009-02-04 05:27:08.000000000 +1300
+++ Rmpi.local/R/zzz.R  2010-05-17 14:25:27.000000000 +1200
@@ -7,6 +7,7 @@
     #    cat(vertxt)

     # Check if lam-mpi is running
+    dyn.load("/usr/pkg/lib/libmpi.so", local=FALSE)
     library.dynam("Rmpi", pkg, lib)
     if (!TRUE)
        stop("Fail to load Rmpi dynamic library.")


Note that this currently hard codes the path to the libmpi.so,
which for our system is in the standard NetBSD PkgSrc location,
though there are probably "nicer" ways to achieve the same end,
and greater flexibility, using R internals.

Having said that, this "fix" does not seem to be needed on
plaforms that have a global scope for shared library symbols,
so maybe attempts to make it generic may be pointless.

Kevin

-- 
Kevin M. Buckley                                  Room:  CO327
School of Engineering and                         Phone: +64 4 463 5971
 Computer Science
Victoria University of Wellington
New Zealand



More information about the R-sig-hpc mailing list