[R-sig-hpc] snow, socket cluster: problem with path to rscript
Dirk Eddelbuettel
edd at debian.org
Fri Apr 17 17:28:33 CEST 2009
On 17 April 2009 at 20:10, Matthieu Stigler wrote:
| Steve Weston a crit :
| > I just noticed that you're running R 2.7.1 on your 192.100.100.212
| > machine. I believe there are known socketConnection issues
| > with that version of R that Luke fixed as of R 2.7.2. So I strongly
| > suggest that you upgrade your version of R.
| >
| I upgraded to R 2.8 but unfortunately this doesn't change, the port
| 10187 is still said to be close...
For what it is worth, I cannot do that either on Ubuntu at work, yet snow
works just fine:
edd at l1:~$ telnet l2 10187
Trying xxx.xx.50.99...
telnet: Unable to connect to remote host: Connection refused
edd at l1:~$ telnet l1 10187
Trying xxx.xx.50.97...
telnet: Unable to connect to remote host: Connection refused
edd at l1:~$ r -lsnow -e'cl <- makeCluster(c("l1","l2"), "SOCK"); print(str(cl)); stopCluster(cl)'
List of 2
$ :List of 3
..$ con :Classes 'sockconn', 'connection' atomic [1:1] 3
.. .. ..- attr(*, "conn_id")=<externalptr>
..$ host: chr "l1"
..$ rank: int 1
..- attr(*, "class")= chr "SOCKnode"
$ :List of 3
..$ con :Classes 'sockconn', 'connection' atomic [1:1] 4
.. .. ..- attr(*, "conn_id")=<externalptr>
..$ host: chr "l2"
..$ rank: int 2
..- attr(*, "class")= chr "SOCKnode"
- attr(*, "class")= chr [1:2] "SOCKcluster" "cluster"
NULL
edd at l1:~$
Maybe that socket-to-port-10187 thing is not really relevant...
Dirk
| I obviously have a problem in opening the port, maybe should I rather
| post on the debian list or on other forums? I use nc -l -p 10187, so
| that telnet xxx.212 10187 is working, did it on both machines, but still
| when running with makeCluster have that issue, also when running from
| worker:
|
| socketConnection("ubuntu", port = 10187)
| 192.100.100.212:10187 cannot be opened
|
|
| and with:
|
| socketConnection(port = 10187, server = TRUE)
|
| nothing happens, what is actually the expected output?
|
| Thanks a lot for your help and advices!!!
|
| Mat
| > --
| > Steve Weston
| > REvolution Computing
| > One Century Tower | 265 Church Street, Suite 1006
| > New Haven, CT 06510
| > P: 203-777-7442 x266 | www.revolution-computing.com
| >
| >
| > On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
| > <matthieu.stigler at gmail.com> wrote:
| >
| >> luke at stat.uiowa.edu a crit :
| >>
| >>> On Wed, 15 Apr 2009, Matthieu Stigler wrote:
| >>>
| >>>
| >>>> Steve Weston a crit :
| >>>>
| >>>>> On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
| >>>>> <matthieu.stigler at gmail.com> wrote:
| >>>>>
| >>>>>
| >>>>>
| >>>>>> So it is now working for the local computer with. However, when trying
| >>>>>> to
| >>>>>> use the external computer, it seems to be working but nothing happens
| >>>>>> after
| >>>>>> he asked for the last password...
| >>>>>>
| >>>>>>
| >>>>> This tells you is that "something went wrong". The basic strategy in
| >>>>> this case
| >>>>> is to use the "outfile" option to hopefully capture an error message.
| >>>>> You might
| >>>>> need to set outfile differently for different slaves, particularly if
| >>>>> you're starting
| >>>>> more than one on the same machine, but I suggest just starting one slave
| >>>>> on 210 to avoid the issue. So do something like:
| >>>>>
| >>>>>
| >>>>>
| >>>>>> host210 <- list(host = "mat at 192.100.100.210", rscript =
| >>>>>> "/usr/bin/Rscript",
| >>>>>>
| >>>>>>
| >>>>> + outfile="/tmp/log.txt")
| >>>>>
| >>>>>
| >>>>>> cl2 <- makeCluster(list(host210), type = "SOCK")
| >>>>>>
| >>>>>>
| >>>>>
| >>>> Ok, thanks for pointing out this methid.
| >>>>
| >>>> I tried it and got following error message. This does not seem not be
| >>>> computer specific (tried to do it to other host 213, and from other host 213
| >>>> to 212, always same error message):
| >>>>
| >>>> starting worker for ubuntu:10187 Error in socketConnection(master, port =
| >>>> port, blocking = TRUE, open = "a+b") : unable to open connection
| >>>>
| >>>> Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
| >>>> socketConnection
| >>>>
| >>>> In addition: Warning message:
| >>>>
| >>>> In socketConnection(master, port = port, blocking = TRUE, open = "a+b") :
| >>>>
| >>>> ubuntu:10187 cannot be opened
| >>>>
| >>>> Execution halted
| >>>>
| >>>>
| >>>> Is it related to ssh or snow? I did not find any reference to that prob
| >>>> googling for it...
| >>>>
| >>> It is an issue with your ability to make a socket connection to the
| >>> master. Most likely the master computer has a firewall that is
| >>> blocking connections to the port snow uses. Try turning the firewall
| >>> off or at least enabling the port in the error message.
| >>> A simple test is to do
| >>>
| >>> socketConnection(port = 10187, server = TRUE)
| >>>
| >>> in an R session on the master and
| >>>
| >>> telnet ubuntu 10187
| >>>
| >>> in a shell on your worker machine (assumign your master is called
| >>> ubuntu) (or you can use R and
| >>>
| >>> socketConnection("ubuntu", port = 10187)
| >>>
| >>> in an R session on the worker).
| >>>
| >>> luke
| >>>
| >>>
| >> Thanks Luke and Dirk for your help!
| >>
| >> I don't think it is a firewall error, as both machines have all port open
| >> (as default with iptables as I understood), and the admin of the network
| >> opened even port 10187.
| >>
| >> I tried first the three solutions suggested, none of them seem to give good
| >> results:
| >>
| >> $telnet 192.100.100.212 10187
| >>
| >> Trying 192.100.100.212...
| >>
| >> telnet: Unable to connect to remote host: Connection refused
| >>
| >> R>socketConnection(port = 10187, server=TRUE)
| >>
| >> #nothing happens... is it right?
| >>
| >>
| >> R > socketConnection("192.100.100.212", port = 10187)
| >> Erreur dans socketConnection("192.100.100.212", port = 10187) :
| >> impossible d'ouvrir la connexion
| >>
| >> De plus : Warning message:
| >>
| >> In socketConnection("192.100.100.212", port = 10187) :
| >>
| >> 192.100.100.212:10187 cannot be opened
| >>
| >> Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
| >>
| >> Going to a ubuntu forum, someone told that one has to open a server on the
| >> port (excuse, explanations are not good as I don't understand that much the
| >> subject :-( ).
| >> So launching in the master (212):
| >>
| >> $nc -l -p 10187
| >>
| >>
| >> then one is able to have in 210:
| >>
| >> $telnet 192.100.100.212 10187
| >>
| >> Trying 192.100.100.212...
| >>
| >> Connected to 192.100.100.212.
| >>
| >> Escape character is '^]'.
| >>
| >> So it seems that it is working, but there is then no effect on the previous
| >> commands socketConnection, makeCluster, still claims that 10187 can't be
| >> open.
| >>
| >> With those elements, do you guys see clearer or is it even darker? Thanks a
| >> lot for your help!
| >>
| >> Matthieu
| >>
| >>
| >>>> Thanks a lot for your help!!
| >>>>
| >>>>> If it hangs, go to another terminal, ssh to 192.100.100.210, and look at
| >>>>> the contents of /tmp/log.txt, and hopefully that will provide a clue to
| >>>>> the problem.
| >>>>>
| >>>>> Another approach is to use the "manual" option. That will print the
| >>>>> command that you should use to manually start each of the slaves.
| >>>>> You just ssh to that machine from another terminal, and cut and paste
| >>>>> the printed command to start the slave. If you set "outfile" to an
| >>>>> empty
| >>>>> string, then output messages will go right to that terminal.
| >>>>>
| >>>>> --
| >>>>> Steve Weston
| >>>>> REvolution Computing
| >>>>> One Century Tower | 265 Church Street, Suite 1006
| >>>>> New Haven, CT 06510
| >>>>> P: 203-777-7442 x266 | www.revolution-computing.com
| >>>>>
| >>>>>
| >>>>
| >>
|
| _______________________________________________
| R-sig-hpc mailing list
| R-sig-hpc at r-project.org
| https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
--
Three out of two people have difficulties with fractions.
More information about the R-sig-hpc
mailing list