[R-sig-hpc] snow clusters on Windows

Stephen Weston stephen.b.weston at gmail.com
Tue Jan 26 20:30:25 CET 2010


makeSOCKcluster will hang if it cannot successfully start all of the
cluster slaves.  I believe that your second R/snow session is "unhanging"
the first session because it is starting up slaves for the first session.  You
can actually do something very much like that, in a more supported way,
by using the "manual" option with makeSOCKcluster.  It will display a command
to use to start each of the slaves.  I think that might be worth doing, because
it might uncover the error that's occurring.

In general, it's a good idea to use the makeSOCKcluster "outfile" option, which
will capture error messages in a file on each of the slaves, but that won't help
if you aren't able to start the slaves running in the first place.

A common problem is that the slaves can't connect back to the master after
they're started.  That can be fixed by specifying the makeSOCKcluster
"master" option.  You can diagnose that problem from the log files created
using the "outfile" option.  But I doubt that you're having that problem,
otherwise that would happen when you specified "localhost" to makeSOCKcluster.

On Windows, I often specify the slaves and the master using IP addresses,
using a command such as:

  cl <- makeCluster(c("192.168.0.101", "192.168.0.102"),
               outfile="C:/temp/snow.log", master="192.168.0.100")

- Steve


On Tue, Jan 26, 2010 at 11:44 AM, Noah Charney <noah at bio.umass.edu> wrote:
> Guy et al,
>
> Thanks for the pointers! I appear to have gotten much of the way there
> (solution described below), but there remains a very strange problem with
> makeCluster().  I'm still just testing it out on a single computer, and,
> as before, it works fine when I call:
>
>> makeSOCKcluster("localhost")
>
> But if I specify my local host by name:
>
>> makeSOCKcluster("Poopy")
>
> It will hang up indefinitely until I open a second R workspace/window on
> the same machine, and try to call the localhost:
>
>> cl<-makeSOCKcluster("localhost")
> Error in socketConnection(port = port, server = TRUE, blocking = TRUE,  :
>  cannot open the connection
> In addition: Warning message:
> In socketConnection(port = port, server = TRUE, blocking = TRUE,  :
>  port 10187 cannot be opened
>
> This error message seems to be a good sign, because when I now look back
> at the original R workspace, it will have completed making the cluster.
> If I try to make a 3 node cluster, then I need to "nudge" it as above 3
> times from the other workspace.  Once the cluster is established,
> clusterApply seems to work fine.  Thoughts?
>
> ------
> To get ssh running from R in the first place on Windows, which was my
> original question:
>
> Install copssh (http://www.itefix.no/i2/download)
>
> Add the ...copssh/bin/ directory to Path variable in windows (Control
> Panel -> System -> Advanced System Settings -> System Variables -> Path)
>
> To speed things up considerably, changed UseDNS to "no" (and deleted
> preceding # to uncomment) in C:\Program Files\CopSSH\etc\sshd_config
>
> I also added a "hosts" file to the CopSSH\etc\ directory with the local IP
> and hostname, but I don't think this was necessary
>
> Followed directions to set up password-less ssh login from
> http://nws-r.sourceforge.net/docs/getting_started.html :
>
>        To generate public and private keys, follow the steps below.
>        Open a DOS terminal
>        ssh-keygen -t rsa
>        cd .ssh (.ssh directory is located in C:/Program Files/copssh/home/user
> on Windows)
>        cp id_rsa.pub authorized_keys This step allows password-less login to
> local machine.
>        For all remote machines that you want password-less login, append the
> content of id_rsa.pub to their  authorized_keys file.
>        To test the password-less login, type the following command:
>        % ssh hostname date
>        If everything is setup correctly, you should not be asked for password
> and the current date on remote machine will be returned.
>
>
> Now, from R, we should be able to type:
>> system("ssh #insert computer name here# date")
> Tue Jan 26 11:30:31 EST 2010
>>
>
> Thanks
> -Noah Charney
> --
> Organismic and Evolutionary Biology
> University of Massachusetts Amherst
> 221 Morrill Science Center South
> Amherst, MA 01003
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list