[R-sig-hpc] snow clusters on Windows

noah at bio.umass.edu noah at bio.umass.edu
Thu Oct 28 16:55:45 CEST 2010


Hi all,

Just to Follow up from an old post, I did eventually get snow working more
or less - thanks for all the help!  I still must rely on manually
connecting the nodes [using "manual=TRUE" in makeSOCKcluster() ].  I run a
simple batch file from the remote computer command prompt that streamlines
making the manual connection happen.

In order set up passwordless-ssh on our cluster, I went through these steps:

1)      Download and install CopSSH (which created another user account
“svcCOPSSH”)

2)      Activate my user name

3)      Add “C:/Program Files/CopSSH/Bin” to the directory to Path
variable in windows (ControlPanel -> System -> Advanced System Settings ->
System Variables -> Path)

4)      To speed things up, changed UseDNS to "no" (and deleted preceding
# to uncomment) in C:\Program Files\CopSSH\etc\sshd_config

5)      Open a DOS terminal, and type  “ssh-keygen -t rsa.”  Leave
passphrases blank, just hit return everywhere it asks.   Copy the code
from “C:/Program Files/copssh/home/[username]/.ssh/id_rsa.pub” into (as an
extra line) “C:/Program
Files/copssh/home/[username]/.ssh/authorized_keys“;  All computers can
have a copy of the same authorized_keys file, with a line for each
computers’ key. 

6)      Now, from R, you should be able to type “system(“ssh [computer
name] date”)” from any computer, and it will give you the date on
[computer name].

7)      Manually let the chosen port through the firewall
 a simple
firewall test seems to be if, on the master computer you type:

               > socketConnection(port = XXXXX, server = TRUE)

                    And on the slave you type:

               > socketConnection("computername", port = XXXXX)


Best of luck
-Noah
 

On Sat, Mar 6, 2010 at 2:38 PM, Stephen Weston
<stephen.b.weston at gmail.com> wrote:
>
> Hi Noah,
>
> What happens when you start the slaves manually on other
> machines on your network?  Do they simply hang?
>
> One of the reasons that it's useful to use manual mode is that
> it makes it easier to see any error message.  And if they do
> hang, you can hit CTRL-C, and then type "traceback()" to
> find out where the program was hung.  I would need some
> more information of that sort before I could even guess what
> your problem is.
>
> That being said, have you ruled out any problems due to
> a firewall?  That is a classic problem when going from one
> to many machines.
>
> - Steve
>
>
> On Sat, Mar 6, 2010 at 1:10 PM, Noah Charney <noah at bio.umass.edu> wrote:
> > Thanks Steve,
> >
> > I've made a bit more progress firing up local nodes with your
suggestions,
> > but I'm still stuck.
> >
> > I can succsesfully use the "manual=TRUE" option to start up slaves
locally
> > if I run:
> >
> >>makeSOCKcluster("128.XXX.XXX.XX", manual=TRUE) #(as long as the IP
> > address is for the local computer)
> >
> > then I run a *.bat file with the requested command from a local command
> > shell...
> >
> > But this only works locally.  I still can't get any slaves started on
> > other machines on the network.  I know ssh is still working, because I
can
> > talk to the other machines in the network using:
> >
> >>system("ssh 128.XXX.XXX.XX date")
> >
> > And, as you mentioned, the log file doesn't have much in it since the
> > slaves never get established.
> >
> > Any other suggestions?
> >
> >
> > Thanks again!
> > -Noah
> > --
> > Organismic and Evolutionary Biology
> > University of Massachusetts Amherst
> > 221 Morrill Science Center South
> > Amherst, MA 01003
> >
> >
> > ____________________________________________________________________________
> > ############################################################################
> >
> >
> > makeSOCKcluster will hang if it cannot successfully start all of the
> > cluster slaves.  I believe that your second R/snow session is "unhanging"
> > the first session because it is starting up slaves for the first session.
> > You
> > can actually do something very much like that, in a more supported way,
> > by using the "manual" option with makeSOCKcluster.  It will display a
command
> > to use to start each of the slaves.  I think that might be worth doing,
> > because
> > it might uncover the error that's occurring.
> >
> > In general, it's a good idea to use the makeSOCKcluster "outfile" option,
> > which
> > will capture error messages in a file on each of the slaves, but that
> > won't help
> > if you aren't able to start the slaves running in the first place.
> >
> > A common problem is that the slaves can't connect back to the master
after
> > they're started.  That can be fixed by specifying the makeSOCKcluster
> > "master" option.  You can diagnose that problem from the log files
created
> > using the "outfile" option.  But I doubt that you're having that problem,
> > otherwise that would happen when you specified "localhost" to
> > makeSOCKcluster.
> >
> > On Windows, I often specify the slaves and the master using IP addresses,
> > using a command such as:
> >
> >  cl <- makeCluster(c("192.168.0.101", "192.168.0.102"),
> >              outfile="C:/temp/snow.log", master="192.168.0.100")
> >
> > - Steve
> >
> >
> > On Tue, Jan 26, 2010 at 11:44 AM, Noah Charney <noah at bio.umass.edu>
wrote:
> >> Guy et al,
> >>
> >> Thanks for the pointers! I appear to have gotten much of the way there
> >> (solution described below), but there remains a very strange problem
with
> >> makeCluster(). ?I'm still just testing it out on a single computer, and,
> >> as before, it works fine when I call:
> >>
> >>> makeSOCKcluster("localhost")
> >>
> >> But if I specify my local host by name:
> >>
> >>> makeSOCKcluster("Poopy")
> >>
> >> It will hang up indefinitely until I open a second R workspace/window on
> >> the same machine, and try to call the localhost:
> >>
> >>> cl<-makeSOCKcluster("localhost")
> >> Error in socketConnection(port = port, server = TRUE, blocking =
TRUE, ?:
> >> ?cannot open the connection
> >> In addition: Warning message:
> >> In socketConnection(port = port, server = TRUE, blocking = TRUE, ?:
> >> ?port 10187 cannot be opened
> >>
> >> This error message seems to be a good sign, because when I now look back
> >> at the original R workspace, it will have completed making the cluster.
> >> If I try to make a 3 node cluster, then I need to "nudge" it as above 3
> >> times from the other workspace. ?Once the cluster is established,
> >> clusterApply seems to work fine. ?Thoughts?
> >>
> >> ------
> >> To get ssh running from R in the first place on Windows, which was my
> >> original question:
> >>
> >> Install copssh (http://www.itefix.no/i2/download)
> >>
> >> Add the ...copssh/bin/ directory to Path variable in windows (Control
> >> Panel -> System -> Advanced System Settings -> System Variables -> Path)
> >>
> >> To speed things up considerably, changed UseDNS to "no" (and deleted
> >> preceding # to uncomment) in C:\Program Files\CopSSH\etc\sshd_config
> >>
> >> I also added a "hosts" file to the CopSSH\etc\ directory with the
local IP
> >> and hostname, but I don't think this was necessary
> >>
> >> Followed directions to set up password-less ssh login from
> >> http://nws-r.sourceforge.net/docs/getting_started.html :
> >>
> >> ? ? ? ?To generate public and private keys, follow the steps below.
> >> ? ? ? ?Open a DOS terminal
> >> ? ? ? ?ssh-keygen -t rsa
> >> ? ? ? ?cd .ssh (.ssh directory is located in C:/Program
> > Files/copssh/home/user
> >> on Windows)
> >> ? ? ? ?cp id_rsa.pub authorized_keys This step allows password-less
> > login to
> >> local machine.
> >> ? ? ? ?For all remote machines that you want password-less login, append
> > the
> >> content of id_rsa.pub to their ?authorized_keys file.
> >> ? ? ? ?To test the password-less login, type the following command:
> >> ? ? ? ?% ssh hostname date
> >> ? ? ? ?If everything is setup correctly, you should not be asked for
> > password
> >> and the current date on remote machine will be returned.
> >>
> >>
> >> Now, from R, we should be able to type:
> >>> system("ssh #insert computer name here# date")
> >> Tue Jan 26 11:30:31 EST 2010
> >>>
> >>
> >> Thanks
> >> -Noah Charney
> >> --
> >> Organismic and Evolutionary Biology
> >> University of Massachusetts Amherst
> >> 221 Morrill Science Center South
> >> Amherst, MA 01003
> >>
> >> _______________________________________________
> >> R-sig-hpc mailing list
> >> R-sig-hpc at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> >>
> > Hi Noah,
> >
> > While I did not use snow, I was able to sucessfully implement cluster
> > computing with windows workstations using NetWorkSpaces:
> > http://nws-r.sourceforge.net/
> >
> > As you noted the key to these solutions is to get SSH running on windows
> > boxes which can be accomplished with Cygwin
> > http://www.cygwin.com/
> >
> > The biggest challenge was configuring SSH and here are some links that I
> > had found helpful back when I was setting this up:
> >
> > http://www1.umn.edu/oit/prod/groups/oit/@pub/@oit/@web/@security/documents/asset/oit_asset_001221.pdf
> >
> > http://lifehacker.com/205090/geek-to-live--set-up-a-personal-home-ssh-server
> >
> > http://www.bmonday.com/articles/653.aspx
> >
> > Good Luck!
> >
> > Best,
> >
> > -- Guy
> >
> >
> > -----Original Message-----
> > From: r-sig-hpc-bounces at r-project.org [mailto:r-sig-hpc-bounces at
> > r-project.org] On Behalf Of Noah Charney
> > Sent: Monday, January 18, 2010 11:59 AM
> > To: r-sig-hpc at r-project.org
> > Subject: [R-sig-hpc] snow clusters on Windows
> >
> > R-HPC list members,
> >
> > I am wondering if anyone could help us with the setup for snow
clusters on
> > windows machines.  Is there something specific we need to do to set up
SSH
> > so that R can access it in the absence of linux?  We are working on
> > Windows Server 2003, and on Vista, the problem is the same on both.  We
> > can create local clusters using the name "localhost" on any computer, but
> > no other name/IP address we substitute for "localhost" works.  Example
> > code is below.
> >
> > #This works fine
> >> cl<-makeCluster('localhost',type='SOCK')
> >> cl
> > [[1]]
> > $con
> >    description           class            mode            text
> > opened
> > "<-Poopy:10187"      "sockconn"           "a+b"        "binary"
> > "opened"
> >       can read       can write
> >          "yes"           "yes"
> > $host
> > [1] "localhost"
> > attr(,"class")
> > [1] "SOCKnode"
> > attr(,"class")
> > [1] "SOCKcluster" "cluster"
> >
> >> stopCluster(cl)
> >
> > #But if we try to call the home computer by name or IP address, or
> > anything else, it doesn't work...
> >> cl<-makeCluster('192.168.1.1',type='SOCK')
> >
> > #It just hangs there until we hit 'escape', at which point it says:
> > Warning message:
> > In system(cmd, wait = FALSE, input = "") : ssh not found
> >
> >
> > Thanks
> > -Noah Charney
> > ---
> > Organismic and Evolutionary Biology
> > University of Massachusetts Amherst
> > 221 Morrill Science Center South
> > Amherst, MA 01003-9297
> >
> > _______________________________________________
> > R-sig-hpc mailing list
> > R-sig-hpc at r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> >



More information about the R-sig-hpc mailing list