[R-sig-hpc] snow clusters on Windows

Stephen Weston stephen.b.weston at gmail.com
Sat Mar 6 19:38:26 CET 2010


Hi Noah,

What happens when you start the slaves manually on other
machines on your network?  Do they simply hang?

One of the reasons that it's useful to use manual mode is that
it makes it easier to see any error message.  And if they do
hang, you can hit CTRL-C, and then type "traceback()" to
find out where the program was hung.  I would need some
more information of that sort before I could even guess what
your problem is.

That being said, have you ruled out any problems due to
a firewall?  That is a classic problem when going from one
to many machines.

- Steve


On Sat, Mar 6, 2010 at 1:10 PM, Noah Charney <noah at bio.umass.edu> wrote:
> Thanks Steve,
>
> I've made a bit more progress firing up local nodes with your suggestions,
> but I'm still stuck.
>
> I can succsesfully use the "manual=TRUE" option to start up slaves locally
> if I run:
>
>>makeSOCKcluster("128.XXX.XXX.XX", manual=TRUE) #(as long as the IP
> address is for the local computer)
>
> then I run a *.bat file with the requested command from a local command
> shell...
>
> But this only works locally.  I still can't get any slaves started on
> other machines on the network.  I know ssh is still working, because I can
> talk to the other machines in the network using:
>
>>system("ssh 128.XXX.XXX.XX date")
>
> And, as you mentioned, the log file doesn't have much in it since the
> slaves never get established.
>
> Any other suggestions?
>
>
> Thanks again!
> -Noah
> --
> Organismic and Evolutionary Biology
> University of Massachusetts Amherst
> 221 Morrill Science Center South
> Amherst, MA 01003
>
>
> ____________________________________________________________________________
> ############################################################################
>
>
> makeSOCKcluster will hang if it cannot successfully start all of the
> cluster slaves.  I believe that your second R/snow session is "unhanging"
> the first session because it is starting up slaves for the first session.
> You
> can actually do something very much like that, in a more supported way,
> by using the "manual" option with makeSOCKcluster.  It will display a command
> to use to start each of the slaves.  I think that might be worth doing,
> because
> it might uncover the error that's occurring.
>
> In general, it's a good idea to use the makeSOCKcluster "outfile" option,
> which
> will capture error messages in a file on each of the slaves, but that
> won't help
> if you aren't able to start the slaves running in the first place.
>
> A common problem is that the slaves can't connect back to the master after
> they're started.  That can be fixed by specifying the makeSOCKcluster
> "master" option.  You can diagnose that problem from the log files created
> using the "outfile" option.  But I doubt that you're having that problem,
> otherwise that would happen when you specified "localhost" to
> makeSOCKcluster.
>
> On Windows, I often specify the slaves and the master using IP addresses,
> using a command such as:
>
>  cl <- makeCluster(c("192.168.0.101", "192.168.0.102"),
>              outfile="C:/temp/snow.log", master="192.168.0.100")
>
> - Steve
>
>
> On Tue, Jan 26, 2010 at 11:44 AM, Noah Charney <noah at bio.umass.edu> wrote:
>> Guy et al,
>>
>> Thanks for the pointers! I appear to have gotten much of the way there
>> (solution described below), but there remains a very strange problem with
>> makeCluster(). ?I'm still just testing it out on a single computer, and,
>> as before, it works fine when I call:
>>
>>> makeSOCKcluster("localhost")
>>
>> But if I specify my local host by name:
>>
>>> makeSOCKcluster("Poopy")
>>
>> It will hang up indefinitely until I open a second R workspace/window on
>> the same machine, and try to call the localhost:
>>
>>> cl<-makeSOCKcluster("localhost")
>> Error in socketConnection(port = port, server = TRUE, blocking = TRUE, ?:
>> ?cannot open the connection
>> In addition: Warning message:
>> In socketConnection(port = port, server = TRUE, blocking = TRUE, ?:
>> ?port 10187 cannot be opened
>>
>> This error message seems to be a good sign, because when I now look back
>> at the original R workspace, it will have completed making the cluster.
>> If I try to make a 3 node cluster, then I need to "nudge" it as above 3
>> times from the other workspace. ?Once the cluster is established,
>> clusterApply seems to work fine. ?Thoughts?
>>
>> ------
>> To get ssh running from R in the first place on Windows, which was my
>> original question:
>>
>> Install copssh (http://www.itefix.no/i2/download)
>>
>> Add the ...copssh/bin/ directory to Path variable in windows (Control
>> Panel -> System -> Advanced System Settings -> System Variables -> Path)
>>
>> To speed things up considerably, changed UseDNS to "no" (and deleted
>> preceding # to uncomment) in C:\Program Files\CopSSH\etc\sshd_config
>>
>> I also added a "hosts" file to the CopSSH\etc\ directory with the local IP
>> and hostname, but I don't think this was necessary
>>
>> Followed directions to set up password-less ssh login from
>> http://nws-r.sourceforge.net/docs/getting_started.html :
>>
>> ? ? ? ?To generate public and private keys, follow the steps below.
>> ? ? ? ?Open a DOS terminal
>> ? ? ? ?ssh-keygen -t rsa
>> ? ? ? ?cd .ssh (.ssh directory is located in C:/Program
> Files/copssh/home/user
>> on Windows)
>> ? ? ? ?cp id_rsa.pub authorized_keys This step allows password-less
> login to
>> local machine.
>> ? ? ? ?For all remote machines that you want password-less login, append
> the
>> content of id_rsa.pub to their ?authorized_keys file.
>> ? ? ? ?To test the password-less login, type the following command:
>> ? ? ? ?% ssh hostname date
>> ? ? ? ?If everything is setup correctly, you should not be asked for
> password
>> and the current date on remote machine will be returned.
>>
>>
>> Now, from R, we should be able to type:
>>> system("ssh #insert computer name here# date")
>> Tue Jan 26 11:30:31 EST 2010
>>>
>>
>> Thanks
>> -Noah Charney
>> --
>> Organismic and Evolutionary Biology
>> University of Massachusetts Amherst
>> 221 Morrill Science Center South
>> Amherst, MA 01003
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
> Hi Noah,
>
> While I did not use snow, I was able to sucessfully implement cluster
> computing with windows workstations using NetWorkSpaces:
> http://nws-r.sourceforge.net/
>
> As you noted the key to these solutions is to get SSH running on windows
> boxes which can be accomplished with Cygwin
> http://www.cygwin.com/
>
> The biggest challenge was configuring SSH and here are some links that I
> had found helpful back when I was setting this up:
>
> http://www1.umn.edu/oit/prod/groups/oit/@pub/@oit/@web/@security/documents/asset/oit_asset_001221.pdf
>
> http://lifehacker.com/205090/geek-to-live--set-up-a-personal-home-ssh-server
>
> http://www.bmonday.com/articles/653.aspx
>
> Good Luck!
>
> Best,
>
> -- Guy
>
>
> -----Original Message-----
> From: r-sig-hpc-bounces at r-project.org [mailto:r-sig-hpc-bounces at
> r-project.org] On Behalf Of Noah Charney
> Sent: Monday, January 18, 2010 11:59 AM
> To: r-sig-hpc at r-project.org
> Subject: [R-sig-hpc] snow clusters on Windows
>
> R-HPC list members,
>
> I am wondering if anyone could help us with the setup for snow clusters on
> windows machines.  Is there something specific we need to do to set up SSH
> so that R can access it in the absence of linux?  We are working on
> Windows Server 2003, and on Vista, the problem is the same on both.  We
> can create local clusters using the name "localhost" on any computer, but
> no other name/IP address we substitute for "localhost" works.  Example
> code is below.
>
> #This works fine
>> cl<-makeCluster('localhost',type='SOCK')
>> cl
> [[1]]
> $con
>    description           class            mode            text
> opened
> "<-Poopy:10187"      "sockconn"           "a+b"        "binary"
> "opened"
>       can read       can write
>          "yes"           "yes"
> $host
> [1] "localhost"
> attr(,"class")
> [1] "SOCKnode"
> attr(,"class")
> [1] "SOCKcluster" "cluster"
>
>> stopCluster(cl)
>
> #But if we try to call the home computer by name or IP address, or
> anything else, it doesn't work...
>> cl<-makeCluster('192.168.1.1',type='SOCK')
>
> #It just hangs there until we hit 'escape', at which point it says:
> Warning message:
> In system(cmd, wait = FALSE, input = "") : ssh not found
>
>
> Thanks
> -Noah Charney
> ---
> Organismic and Evolutionary Biology
> University of Massachusetts Amherst
> 221 Morrill Science Center South
> Amherst, MA 01003-9297
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list