[R] snow's makeCluster hanging (using Rmpi)

Ramon Diaz-Uriarte rdiaz at cnio.es
Wed Nov 8 11:34:18 CET 2006


On Tuesday 07 November 2006 19:28, Randall C Johnson [Contr.] wrote:
> On 11/7/06 11:28 AM, "Ramon Diaz-Uriarte" <rdiaz at cnio.es> wrote:
> > On Tuesday 07 November 2006 15:56, Randall C Johnson [Contr.] wrote:
> >> Hello everyone,
> >> I've been fiddling around with the snow and Rmpi packages on my new
> >> Intel Mac, and have run into a few problems. When I make a cluster on my
> >> machine, both slaves start up just fine, and everything works as
> >> expected. When I try to make a cluster including another networked
> >> machine it hangs. I've followed the suggestions at
> >> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83086.html and
> >> http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html but to no avail.
> >> Everything seems to start up fine using lamboot, but then hangs when
> >> making the cluster in R. Making a cluster with 2 slaves seems to work
> >> fine, but if I increase the number (to use the networked machines) it
> >> hangs again.
> >>
> >> I've tried networking to another Mac, and also to a machine running Red
> >> Hat Linux. Both machines can set up their own local clusters. Does
> >> anyone have any ideas?
> >
> > Dear Randy,
> >
> > A few suggestions:
> >
> > a) make sure there are no firewalls; I assume this is actually the case,
> > but anyway;
>
> I don't think I have any firewalls running. I checked and they all seem to
> be disabled...
>

you can use (under GNU/Linux at least) the command (as root)

iptables -L

If there are no iptables-based firewall you should see something like:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Make sure this is OK in all the machines.


> > b) what happens if you lamboot outside R (and create a universe with a
> > local and a networked machine) and then you do: "lamexec -np 6 hostname"?
>
> This prints out the host names of each machine as expected.
>

OK, so its not lam itself (so a) is probably unneeded).


> > c) are the Rmpi and snow installed in the same directories in the
> > different machines? are there version differences in Rmpi (or Snow)
> > between machines?
>
> I've installed the same versions, but they are in different directories...
>

I think I remember that having Rmpi and Snow in different directories tended 
to cause problems. Now, I always place them in the same directory. I think 
that some sh Rmpi script looks for other scripts, and if they are not where 
it expect thems, it fails.


> I also tried an example per Luke Tierney's suggestion using only Rmpi, and
> I get the following error when trying to spawn the Rslaves after starting
> up with lamboot (outside of R). I tried to use laminfo, but I'm not sure
> what I'm looking for or how to use the information given...
>
> > library(Rmpi)
> > mpi.spawn.Rslaves()
>
> ---------------------------------------------------------------------------
>-
>
> It seems that [at least] one of the child processes that was started
> by MPI_Comm_spawn* chose a different RPI than the parent MPI
> application.  For example, one (of the) child process(es) that
> differed from the parent is shown below:
>
>     Parent application: MPI_Comm_spawn
>     Child MPI_COMM_WORLD rank usysv (v7.1.0): 0
>
> All MPI processes must choose the same RPI module and version when
> they start.  Check your SSI settings and/or the local environment
> variables on each node.
> ---------------------------------------------------------------------------
>- R(26444) malloc: ***  Deallocation of a pointer not malloced: 0x16379a0;
> This could be a double free(), or free() called with the middle of an
> allocated block; Try setting environment variable MallocHelp to see tools
> to help debug
> Error in mpi.comm.spawn(slave = system.file("Rslaves.sh", package =
> "Rmpi"),
>
>     MPI_Error_string: unclassified
>

Now that is way over my head. A few things I'd check: Are you mixing 32-bit 
with 64-bit machines? (I've done that in the past, x86 and x86_64, without 
apparent problems, but I've never used Macs for this). Can you try using two 
different machines with the same architecture? What about gcc compilers: are 
you using very different versions on each machine?


Best,

R.


> > HTH,
> >
> > R.
> >
> >> Thanks,
> >> Randy
> >>
> >>> sessionInfo()
> >>
> >> R version 2.4.0 Patched (2006-10-03 r39576)
> >> i386-apple-darwin8.8.2
> >>
> >> locale:
> >> C
> >>
> >> attached base packages:
> >> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"    
> >> "datasets" [7] "base"
> >>
> >> other attached packages:
> >>    Rmpi    snow
> >> "0.5-3" "0.2-2"

-- 
Ramón Díaz-Uriarte
Bioinformatics 
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://ligarto.org/rdiaz
PGP KeyID: 0xE89B3462
(http://ligarto.org/rdiaz/0xE89B3462.asc)



**NOTA DE CONFIDENCIALIDAD** Este correo electrónico, y en s...{{dropped}}



More information about the R-help mailing list