[R-sig-hpc] snow, socket cluster: problem with path to rscript

luke at stat.uiowa.edu luke at stat.uiowa.edu
Fri Apr 17 16:58:58 CEST 2009


On Fri, 17 Apr 2009, Matthieu Stigler wrote:

> Steve Weston a écrit :
>> I just noticed that you're running R 2.7.1 on your 192.100.100.212
>> machine.  I believe there are known socketConnection issues
>> with that version of R that Luke fixed as of R 2.7.2.  So I strongly
>> suggest that you upgrade your version of R.
>> 
> I upgraded to R 2.8 but unfortunately this doesn't change, the port 10187 is 
> still said to be close...
>
> I obviously have a problem in opening the port, maybe should I rather post on 
> the debian list or on other forums? I use nc -l -p 10187, so that telnet

According to my man page that argument combination is not legal so I
don't know what you actually did.

> xxx.212 10187 is working, did it on both machines, but still when running 
> with makeCluster have that issue, also when running from worker:
>
> socketConnection("ubuntu", port = 10187)
> 192.100.100.212:10187 cannot be opened
>
>
> and with:
>
> socketConnection(port = 10187, server = TRUE)
>
> nothing happens, what is actually the expected output?

the server call waits until a connection occurs and then returns an R
connection object.  The clinet socketConnection call returns a socket
connection if curresful and gives an error message if not.

So on the master do

s <- socketConnection(port = 10187, server = TRUE)

and this will wait for a connection and return to the prompt when a
connectin occurs.  On the wroker machine

telnet master 10187

will either succeed and wait until the server socket is closed or fail
with an error message about not being able to open the port.  If I use

nc master 10187

then no an successful connection nc waits (for input) until the server
closes the socket with close(s) and then returns to the shell prompt.
Failure for me is an immediate resurn to the shell prompt, no error
message (and the server side continues to wait).

luke

>
> Thanks a lot for your help and advices!!!
>
> Mat
>> --
>> Steve Weston
>> REvolution Computing
>> One Century Tower | 265 Church Street, Suite 1006
>> New Haven, CT  06510
>> P: 203-777-7442 x266 | www.revolution-computing.com
>> 
>> 
>> On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
>> <matthieu.stigler at gmail.com> wrote:
>> 
>>> luke at stat.uiowa.edu a écrit :
>>> 
>>>> On Wed, 15 Apr 2009, Matthieu Stigler wrote:
>>>>
>>>> 
>>>>> Steve Weston a écrit :
>>>>> 
>>>>>> On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
>>>>>> <matthieu.stigler at gmail.com> wrote:
>>>>>> 
>>>>>>
>>>>>> 
>>>>>>> So it is now working for the local computer with. However, when trying
>>>>>>> to
>>>>>>> use the external computer, it seems to be working but nothing happens
>>>>>>> after
>>>>>>> he asked for the last password...
>>>>>>>
>>>>>>> 
>>>>>> This tells you is that "something went wrong".  The basic strategy in
>>>>>> this case
>>>>>> is to use the "outfile" option to hopefully capture an error message.
>>>>>>  You might
>>>>>> need to set outfile differently for different slaves, particularly if
>>>>>> you're starting
>>>>>> more than one on the same machine, but I suggest just starting one 
>>>>>> slave
>>>>>> on 210 to avoid the issue.  So do something like:
>>>>>> 
>>>>>>
>>>>>> 
>>>>>>> host210 <- list(host = "mat at 192.100.100.210", rscript =
>>>>>>> "/usr/bin/Rscript",
>>>>>>>
>>>>>>> 
>>>>>> +                       outfile="/tmp/log.txt")
>>>>>>
>>>>>> 
>>>>>>> cl2 <- makeCluster(list(host210), type = "SOCK")
>>>>>>>
>>>>>>>
>>>>>> 
>>>>> Ok, thanks for pointing out this methid.
>>>>> 
>>>>> I tried it and got following error message. This does not seem not be
>>>>> computer specific (tried to do it to other host 213, and from other host 
>>>>> 213
>>>>> to 212, always same error message):
>>>>> 
>>>>> starting worker for ubuntu:10187 Error in socketConnection(master, port 
>>>>> =
>>>>> port, blocking = TRUE, open = "a+b") : unable to open connection
>>>>> 
>>>>> Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
>>>>> socketConnection
>>>>> 
>>>>> In addition: Warning message:
>>>>> 
>>>>> In socketConnection(master, port = port, blocking = TRUE, open = "a+b") 
>>>>> :
>>>>> 
>>>>> ubuntu:10187 cannot be opened
>>>>> 
>>>>> Execution halted
>>>>> 
>>>>> 
>>>>> Is it related to ssh or snow? I did not find any reference to that prob
>>>>> googling for it...
>>>>> 
>>>> It is an issue with your ability to make a socket connection to the
>>>> master. Most likely the master computer has a firewall that is
>>>> blocking connections to the port snow uses.  Try turning the firewall
>>>> off or at least enabling the port in the error message.
>>>> A simple test is to do
>>>>
>>>>    socketConnection(port = 10187, server = TRUE)
>>>> 
>>>> in an R session on the master and
>>>>
>>>>    telnet ubuntu 10187
>>>> 
>>>> in a shell on your worker machine (assumign your master is called
>>>> ubuntu) (or you can use R and
>>>>
>>>>    socketConnection("ubuntu", port = 10187)
>>>> 
>>>> in an R session on the worker).
>>>> 
>>>> luke
>>>>
>>>> 
>>> Thanks Luke and Dirk for your help!
>>> 
>>> I don't think it is a firewall error, as both machines have all port open
>>> (as default with iptables as I understood), and the admin of the network
>>> opened even port 10187.
>>> 
>>> I tried first the three solutions suggested, none of them seem to give 
>>> good
>>> results:
>>> 
>>> $telnet 192.100.100.212 10187
>>> 
>>> Trying 192.100.100.212...
>>> 
>>> telnet: Unable to connect to remote host: Connection refused
>>> 
>>> R>socketConnection(port = 10187, server=TRUE)
>>> 
>>> #nothing happens... is it right?
>>> 
>>> 
>>> R > socketConnection("192.100.100.212", port = 10187)
>>> Erreur dans socketConnection("192.100.100.212", port = 10187) :
>>>  impossible d'ouvrir la connexion
>>> 
>>> De plus : Warning message:
>>> 
>>> In socketConnection("192.100.100.212", port = 10187) :
>>>
>>>  192.100.100.212:10187 cannot be opened
>>> 
>>> Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
>>> 
>>> Going to a ubuntu forum, someone told that one has to open a server on the
>>> port (excuse, explanations are not good as I don't understand that much 
>>> the
>>> subject :-( ).
>>> So launching in the master (212):
>>> 
>>> $nc -l -p 10187
>>> 
>>> 
>>> then one is able to  have in 210:
>>> 
>>> $telnet 192.100.100.212 10187
>>> 
>>> Trying 192.100.100.212...
>>> 
>>> Connected to 192.100.100.212.
>>> 
>>> Escape character is '^]'.
>>> 
>>> So it seems that it is working, but there is then no effect on the 
>>> previous
>>> commands socketConnection, makeCluster, still claims that 10187 can't be
>>> open.
>>> 
>>> With those elements, do you guys see clearer or is it even darker? Thanks 
>>> a
>>> lot for your help!
>>> 
>>> Matthieu
>>>
>>> 
>>>>> Thanks a lot for your help!!
>>>>> 
>>>>>> If it hangs, go to another terminal, ssh to 192.100.100.210, and look 
>>>>>> at
>>>>>> the contents of /tmp/log.txt, and hopefully that will provide a clue to
>>>>>> the problem.
>>>>>> 
>>>>>> Another approach is to use the "manual" option.  That will print the
>>>>>> command that you should use to manually start each of the slaves.
>>>>>> You just ssh to that machine from another terminal, and cut and paste
>>>>>> the printed command to start the slave.  If you set "outfile" to an
>>>>>> empty
>>>>>> string, then output messages will go right to that terminal.
>>>>>> 
>>>>>> --
>>>>>> Steve Weston
>>>>>> REvolution Computing
>>>>>> One Century Tower | 265 Church Street, Suite 1006
>>>>>> New Haven, CT  06510
>>>>>> P: 203-777-7442 x266 | www.revolution-computing.com
>>>>>>
>>>>>>
>>>>>
>>> 
>
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


More information about the R-sig-hpc mailing list