[Rd] problem with Rmpi 0.5-5 and openmpi

Peter Pearman pearman at wsl.ch
Sat Apr 12 09:50:31 CEST 2008


Luke, Brian,

Thank you for your suggestions.  Since we were using 0.5-5, the patch 
had already been incorporated.  It turned out that we had a couple of 
problems.  The weird one actually lay in the configuration of the 
cluster.  While we believed the entire cluster had Myrinet, actually the 
master does not.  Why? I don't know, we didn't configure it.  But having 
configured openmpi to use Myrinet some files openmpi was looking for 
were obviously not there.  By logging onto a slave node and using it as 
master, everything works fine.

We have not tested how things really work with Myrinet, Luke, but if we 
don't post to the list again about it, you can assume that it works ok now.

Again, thanks for your suggestions.

Best,

Peter

Luke Tierney wrote:
> You may want to be a little careful about using R and MPI with
> Myrinet. We tried to get that to work around 5 years ago and found it
> more trouble than it was worth (we ended up sending the Myrinet
> equipment back and getting gigabit ethernet instead).
> 
> As I recall, the problem at the time was that the Myrinet libraries
> used their own malloc (or assumed that their own malloc was used -- I
> don't recall which), whereas a standard R build would use the system
> malloc.  The resulting combination would work for small allocations.
> But for large allocations allocated by R using the standard malloc
> (large meaning allocated by memory mapping) the Myrinet internals
> would assume these allocations had been allocated by their malloc and
> start doing things with them that lead to crashes.  I'm fuzzy on the
> details but this should give you an idea of what to look out for.
> 
> Best,
> 
> luke
> 
> On Mon, 7 Apr 2008, Peter Pearman wrote:
> 
>> Dear knowledgeable experts :-),
>>
>> I am trying to get openmpi, Rmpi and SNOW running on a Myrinet/GM
>> cluster.  I'm not an IT expert, but I surely could use a working
>> installation of Rmpi and SNOW.
>>
>> I try to load the Rmpi library and get the following:
>>
>> > library(Rmpi)
>> [master:07230] mca: base: component_find: unable to open osc pt2pt: file
>> not found (ignored)
>> -------------------------------------------------------------------------- 
>>
>> [0,0,0]: Myrinet/GM on host master was unable to find any NICs.
>> Another transport will be used instead, although this may result in
>> lower performance.
>>
>> Then if I start a cluster, only nodes on Master can be seen and initiated
>>
>> Here is environment, configuration and installation information:
>>
>>
>> # configuration of openmpi
>> ./configure --prefix=/opt/openmpi --with-gm=/opt/gm/2.1.21/2.6.11-21smp
>> --with-sge=/opt/sge --disable-mpi-f90
>>
>> #installation of Rmpi
>> /usr/local/bin/R CMD INSTALL Rmpi_0.5-5.tar.gz \
>> --configure-args=--with-mpi=/opt/openmpi
>>
>> R version 2.6.1 (2007-11-26)
>>
>> #master environment
>> pearman at master:~> env
>> LESSKEY=/etc/lesskey.bin
>> NNTPSERVER=news
>> INFODIR=/usr/local/info:/usr/share/info:/usr/info
>> MANPATH=/opt/sge/man:/usr/local/man:/usr/local/share/man:/usr/share/man:/usr/X11R6/man:/opt/gnome/share/man:/opt/c3-4/man 
>>
>> HOSTNAME=master
>> GNOME2_PATH=/usr/local:/opt/gnome:/usr
>> XKEYSYMDB=/usr/X11R6/lib/X11/XKeysymDB
>> HOST=master
>> TERM=xterm
>> SHELL=/bin/bash
>> PROFILEREAD=true
>> HISTSIZE=1000
>> SSH_CLIENT=::ffff:10.15.1.179 51565 22
>> QTDIR=/usr/lib/qt3
>> OLDPWD=/opt/openmpi/bin
>> SSH_TTY=/dev/pts/6
>> GROFF_NO_SGR=yes
>> JRE_HOME=/usr/lib/jvm/java/jre
>> USER=pearman
>> LS_COLORS=no=00:fi=00:di=01;34:ln=00;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=4 
>>
>> 0;31:ex=00;32:*.cmd=00;32:*.exe=01;32:*.com=01;32:*.bat=01;32:*.btm=01;32:*.dll=01;32:*.tar=00; 
>>
>> 31:*.tbz=00;31:*.tgz=00;31:*.rpm=00;31:*.deb=00;31:*.arj=00;31:*.taz=00;31:*.lzh=00;31:*.zip=00 
>>
>> ;31:*.zoo=00;31:*.z=00;31:*.Z=00;31:*.gz=00;31:*.bz2=00;31:*.tb2=00;31:*.tz2=00;31:*.tbz2=00;31 
>>
>> :*.avi=01;35:*.bmp=01;35:*.fli=01;35:*.gif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mng=01;35:*.mov=01; 
>>
>> 35:*.mpg=01;35:*.pcx=01;35:*.pbm=01;35:*.pgm=01;35:*.png=01;35:*.ppm=01;35:*.tga=01;35:*.tif=01 
>>
>> ;35:*.xbm=01;35:*.xpm=01;35:*.dl=01;35:*.gl=01;35:*.aiff=00;32:*.au=00;32:*.mid=00;32:*.mp3=00; 
>>
>> 32:*.ogg=00;32:*.voc=00;32:*.wav=00;32:
>> LD_LIBRARY_PATH=/opt/openmpi/lib:/opt/sge/lib/lx24-amd64
>> XNLSPATH=/usr/X11R6/lib/X11/nls
>> HOSTTYPE=x86_64
>> PAGER=less
>> XDG_CONFIG_DIRS=/usr/local/etc/xdg/:/etc/xdg/:/etc/opt/gnome/xdg/
>> C3_PATH=/opt/c3-4
>> MINICOM=-c on
>> MAIL=/var/mail/pearman
>> PATH=/opt/openmpi/bin:/opt/sge/bin/lx24-amd64:/opt/mpich/1.2.6..14b/x86_64/gcc//bin:/opt/gm/2.1 
>>
>> .21/2.6.11-21smp/bin:/opt/c3-4:/home/pearman/bin:/usr/local/bin:/usr/bin:/usr/X11R6/bin:/bin:/u 
>>
>> sr/games:/opt/gnome/bin:/opt/kde3/bin:/opt/pathscale/bin
>> CPU=x86_64
>> JAVA_BINDIR=/usr/lib/jvm/java/bin
>> INPUTRC=/etc/inputrc
>> PWD=/home/pearman
>> JAVA_HOME=/usr/lib/jvm/java
>> LANG=en_US.UTF-8
>> PYTHONSTARTUP=/etc/pythonstart
>> SGE_ROOT=/opt/sge
>> SDK_HOME=/usr/lib/jvm/java
>> C3_RSH=ssh
>> TEXINPUTS=:/home/pearman/.TeX:/usr/share/doc/.TeX:/usr/doc/.TeX
>> JDK_HOME=/usr/lib/jvm/java
>> SHLVL=1
>> HOME=/home/pearman
>> LESS_ADVANCED_PREPROCESSOR=no
>> OSTYPE=linux
>> LS_OPTIONS=-N --color=tty -T 0
>> XCURSOR_THEME=crystalwhite
>> WINDOWMANAGER=/usr/X11R6/bin/kde
>> GTK_PATH=/usr/local/lib/gtk-2.0:/opt/gnome/lib/gtk-2.0:/usr/lib/gtk-2.0
>> GM_PATH=/opt/gm/2.1.21/2.6.11-21smp
>> G_FILENAME_ENCODING=@locale,UTF-8,ISO-8859-15,CP1252
>> LESS=-M -I
>> MACHTYPE=x86_64-suse-linux
>> LOGNAME=pearman
>> GTK_PATH64=/usr/local/lib64/gtk-2.0:/opt/gnome/lib64/gtk-2.0:/usr/lib64/gtk-2.0 
>>
>> CVS_RSH=ssh
>> XDG_DATA_DIRS=/usr/local/share/:/usr/share/:/etc/opt/kde3/share/:/opt/kde3/share/:/opt/gnome/sh 
>>
>> are/
>> MPICH_PATH=/opt/mpich/1.2.6..14b/x86_64/gcc/
>> ACLOCAL_FLAGS=-I /opt/gnome/share/aclocal
>> SSH_CONNECTION=::ffff:10.15.1.179 51565 ::ffff:10.30.1.15 22
>> PKG_CONFIG_PATH=/opt/gnome/lib64/pkgconfig
>> LESSOPEN=lessopen.sh %s
>> INFOPATH=/usr/local/info:/usr/share/info:/usr/info:/opt/gnome/share/info
>> LESSCLOSE=lessclose.sh %s %s
>> G_BROKEN_FILENAMES=1
>> JAVA_ROOT=/usr/lib/jvm/java
>> COLORTERM=1
>> _=/usr/bin/env
>>
>>
>> pearman at master:~> which mpirun
>> /opt/openmpi/bin/mpirun
>>
>> Also note:  mpich is also installed and is also in the PATH, after 
>> openmpi
>>
>> BTW.  I have seen a number of posts concerning the pt2pt error message.
>>  Still, I was unable to understand how they might apply to fixing the
>> current problem.
>>
>> Help would be greatly appreciated.
>>
>> Peter
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 

-- 
Peter B. Pearman
Land Use Dynamics
Swiss Federal Research Institute WSL
Zürcherstrasse 111
CH-8903 Birmensdorf, Switzerland

pearman at wsl.ch
++41 (0)44 739 25 24



More information about the R-devel mailing list