[R-sig-hpc] snow, socket cluster: problem with path to rscript

luke at stat.uiowa.edu luke at stat.uiowa.edu
Mon Apr 20 16:19:40 CEST 2009


Glad it is working now.

By default snow uses Sys.info()["nodename"] on the master to determine
the name of the master that is used for the back connection. If you
supply an alternative as master="123...." withthe IP address then that
should work around not having the master name known on the worker.

luke

On Mon, 20 Apr 2009, Matthieu Stigler wrote:

> I solved the problem ;-)
>
> I think that the issue come from that snow is expecting that the name of the 
> master has been exported in /etc/hosts (typically by IP master_name) on the 
> workers. In my case, it wasn't working before and since I've exported it it 
> is working!
>
> The error I did was to check with
>
> #open from the master (212):
> socketConnection(port = 10187, server = TRUE)
>
> #and from the slave
> socketConnection("192.100.100.212", port = 10187) #from 210
>
> #Even if this works it is not a sufficient condition, for snow to work, 
> indeed:
> socketConnection("master_name", port = 10187) #from 210
> has to be working also
>
>
>
> Thanks a lot for the help of Dirk, Luke and Steve, who helped me a lot in 
> finding this!!
>
> Matthieu
>
> luke at stat.uiowa.edu a écrit :
>> On Fri, 17 Apr 2009, Matthieu Stigler wrote:
>> 
>>> Steve Weston a écrit :
>>>> I just noticed that you're running R 2.7.1 on your 192.100.100.212
>>>> machine.  I believe there are known socketConnection issues
>>>> with that version of R that Luke fixed as of R 2.7.2.  So I strongly
>>>> suggest that you upgrade your version of R.
>>>> 
>>> I upgraded to R 2.8 but unfortunately this doesn't change, the port 10187 
>>> is still said to be close...
>>> 
>>> I obviously have a problem in opening the port, maybe should I rather post 
>>> on the debian list or on other forums? I use nc -l -p 10187, so that 
>>> telnet
>> 
>> According to my man page that argument combination is not legal so I
>> don't know what you actually did.
>> 
>>> xxx.212 10187 is working, did it on both machines, but still when running 
>>> with makeCluster have that issue, also when running from worker:
>>> 
>>> socketConnection("ubuntu", port = 10187)
>>> 192.100.100.212:10187 cannot be opened
>>> 
>>> 
>>> and with:
>>> 
>>> socketConnection(port = 10187, server = TRUE)
>>> 
>>> nothing happens, what is actually the expected output?
>> 
>> the server call waits until a connection occurs and then returns an R
>> connection object.  The clinet socketConnection call returns a socket
>> connection if curresful and gives an error message if not.
>> 
>> So on the master do
>> 
>> s <- socketConnection(port = 10187, server = TRUE)
>> 
>> and this will wait for a connection and return to the prompt when a
>> connectin occurs.  On the wroker machine
>> 
>> telnet master 10187
>> 
>> will either succeed and wait until the server socket is closed or fail
>> with an error message about not being able to open the port.  If I use
>> 
>> nc master 10187
>> 
>> then no an successful connection nc waits (for input) until the server
>> closes the socket with close(s) and then returns to the shell prompt.
>> Failure for me is an immediate resurn to the shell prompt, no error
>> message (and the server side continues to wait).
>> 
>> luke
>> 
>>> 
>>> Thanks a lot for your help and advices!!!
>>> 
>>> Mat
>>>> -- 
>>>> Steve Weston
>>>> REvolution Computing
>>>> One Century Tower | 265 Church Street, Suite 1006
>>>> New Haven, CT  06510
>>>> P: 203-777-7442 x266 | www.revolution-computing.com
>>>> 
>>>> 
>>>> On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
>>>> <matthieu.stigler at gmail.com> wrote:
>>>> 
>>>>> luke at stat.uiowa.edu a écrit :
>>>>> 
>>>>>> On Wed, 15 Apr 2009, Matthieu Stigler wrote:
>>>>>> 
>>>>>> 
>>>>>>> Steve Weston a écrit :
>>>>>>> 
>>>>>>>> On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
>>>>>>>> <matthieu.stigler at gmail.com> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> So it is now working for the local computer with. However, when 
>>>>>>>>> trying
>>>>>>>>> to
>>>>>>>>> use the external computer, it seems to be working but nothing 
>>>>>>>>> happens
>>>>>>>>> after
>>>>>>>>> he asked for the last password...
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> This tells you is that "something went wrong".  The basic strategy in
>>>>>>>> this case
>>>>>>>> is to use the "outfile" option to hopefully capture an error message.
>>>>>>>>  You might
>>>>>>>> need to set outfile differently for different slaves, particularly if
>>>>>>>> you're starting
>>>>>>>> more than one on the same machine, but I suggest just starting one 
>>>>>>>> slave
>>>>>>>> on 210 to avoid the issue.  So do something like:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> host210 <- list(host = "mat at 192.100.100.210", rscript =
>>>>>>>>> "/usr/bin/Rscript",
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> +                       outfile="/tmp/log.txt")
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> cl2 <- makeCluster(list(host210), type = "SOCK")
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> Ok, thanks for pointing out this methid.
>>>>>>> 
>>>>>>> I tried it and got following error message. This does not seem not be
>>>>>>> computer specific (tried to do it to other host 213, and from other 
>>>>>>> host 213
>>>>>>> to 212, always same error message):
>>>>>>> 
>>>>>>> starting worker for ubuntu:10187 Error in socketConnection(master, 
>>>>>>> port =
>>>>>>> port, blocking = TRUE, open = "a+b") : unable to open connection
>>>>>>> 
>>>>>>> Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
>>>>>>> socketConnection
>>>>>>> 
>>>>>>> In addition: Warning message:
>>>>>>> 
>>>>>>> In socketConnection(master, port = port, blocking = TRUE, open = 
>>>>>>> "a+b") :
>>>>>>> 
>>>>>>> ubuntu:10187 cannot be opened
>>>>>>> 
>>>>>>> Execution halted
>>>>>>> 
>>>>>>> 
>>>>>>> Is it related to ssh or snow? I did not find any reference to that 
>>>>>>> prob
>>>>>>> googling for it...
>>>>>>> 
>>>>>> It is an issue with your ability to make a socket connection to the
>>>>>> master. Most likely the master computer has a firewall that is
>>>>>> blocking connections to the port snow uses.  Try turning the firewall
>>>>>> off or at least enabling the port in the error message.
>>>>>> A simple test is to do
>>>>>>
>>>>>>    socketConnection(port = 10187, server = TRUE)
>>>>>> 
>>>>>> in an R session on the master and
>>>>>>
>>>>>>    telnet ubuntu 10187
>>>>>> 
>>>>>> in a shell on your worker machine (assumign your master is called
>>>>>> ubuntu) (or you can use R and
>>>>>>
>>>>>>    socketConnection("ubuntu", port = 10187)
>>>>>> 
>>>>>> in an R session on the worker).
>>>>>> 
>>>>>> luke
>>>>>> 
>>>>>> 
>>>>> Thanks Luke and Dirk for your help!
>>>>> 
>>>>> I don't think it is a firewall error, as both machines have all port 
>>>>> open
>>>>> (as default with iptables as I understood), and the admin of the network
>>>>> opened even port 10187.
>>>>> 
>>>>> I tried first the three solutions suggested, none of them seem to give 
>>>>> good
>>>>> results:
>>>>> 
>>>>> $telnet 192.100.100.212 10187
>>>>> 
>>>>> Trying 192.100.100.212...
>>>>> 
>>>>> telnet: Unable to connect to remote host: Connection refused
>>>>> 
>>>>> R>socketConnection(port = 10187, server=TRUE)
>>>>> 
>>>>> #nothing happens... is it right?
>>>>> 
>>>>> 
>>>>> R > socketConnection("192.100.100.212", port = 10187)
>>>>> Erreur dans socketConnection("192.100.100.212", port = 10187) :
>>>>>  impossible d'ouvrir la connexion
>>>>> 
>>>>> De plus : Warning message:
>>>>> 
>>>>> In socketConnection("192.100.100.212", port = 10187) :
>>>>>
>>>>>  192.100.100.212:10187 cannot be opened
>>>>> 
>>>>> Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
>>>>> 
>>>>> Going to a ubuntu forum, someone told that one has to open a server on 
>>>>> the
>>>>> port (excuse, explanations are not good as I don't understand that much 
>>>>> the
>>>>> subject :-( ).
>>>>> So launching in the master (212):
>>>>> 
>>>>> $nc -l -p 10187
>>>>> 
>>>>> 
>>>>> then one is able to  have in 210:
>>>>> 
>>>>> $telnet 192.100.100.212 10187
>>>>> 
>>>>> Trying 192.100.100.212...
>>>>> 
>>>>> Connected to 192.100.100.212.
>>>>> 
>>>>> Escape character is '^]'.
>>>>> 
>>>>> So it seems that it is working, but there is then no effect on the 
>>>>> previous
>>>>> commands socketConnection, makeCluster, still claims that 10187 can't be
>>>>> open.
>>>>> 
>>>>> With those elements, do you guys see clearer or is it even darker? 
>>>>> Thanks a
>>>>> lot for your help!
>>>>> 
>>>>> Matthieu
>>>>> 
>>>>> 
>>>>>>> Thanks a lot for your help!!
>>>>>>> 
>>>>>>>> If it hangs, go to another terminal, ssh to 192.100.100.210, and look 
>>>>>>>> at
>>>>>>>> the contents of /tmp/log.txt, and hopefully that will provide a clue 
>>>>>>>> to
>>>>>>>> the problem.
>>>>>>>> 
>>>>>>>> Another approach is to use the "manual" option.  That will print the
>>>>>>>> command that you should use to manually start each of the slaves.
>>>>>>>> You just ssh to that machine from another terminal, and cut and paste
>>>>>>>> the printed command to start the slave.  If you set "outfile" to an
>>>>>>>> empty
>>>>>>>> string, then output messages will go right to that terminal.
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> Steve Weston
>>>>>>>> REvolution Computing
>>>>>>>> One Century Tower | 265 Church Street, Suite 1006
>>>>>>>> New Haven, CT  06510
>>>>>>>> P: 203-777-7442 x266 | www.revolution-computing.com
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>>> 
>> 
>
>

-- 
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu


More information about the R-sig-hpc mailing list