[R-sig-hpc] snow, socket cluster: problem with path to rscript

Matthieu Stigler matthieu.stigler at gmail.com
Mon Apr 20 14:34:39 CEST 2009


I solved the problem ;-)

I think that the issue come from that snow is expecting that the name of 
the master has been exported in /etc/hosts (typically by IP master_name) 
on the workers. In my case, it wasn't working before and since I've 
exported it it is working!

The error I did was to check with

#open from the master (212):
socketConnection(port = 10187, server = TRUE)

#and from the slave
socketConnection("192.100.100.212", port = 10187) #from 210

#Even if this works it is not a sufficient condition, for snow to work, 
indeed:
socketConnection("master_name", port = 10187) #from 210
has to be working also



Thanks a lot for the help of Dirk, Luke and Steve, who helped me a lot 
in finding this!!

Matthieu

luke at stat.uiowa.edu a écrit :
> On Fri, 17 Apr 2009, Matthieu Stigler wrote:
>
>> Steve Weston a écrit :
>>> I just noticed that you're running R 2.7.1 on your 192.100.100.212
>>> machine.  I believe there are known socketConnection issues
>>> with that version of R that Luke fixed as of R 2.7.2.  So I strongly
>>> suggest that you upgrade your version of R.
>>>
>> I upgraded to R 2.8 but unfortunately this doesn't change, the port 
>> 10187 is still said to be close...
>>
>> I obviously have a problem in opening the port, maybe should I rather 
>> post on the debian list or on other forums? I use nc -l -p 10187, so 
>> that telnet
>
> According to my man page that argument combination is not legal so I
> don't know what you actually did.
>
>> xxx.212 10187 is working, did it on both machines, but still when 
>> running with makeCluster have that issue, also when running from worker:
>>
>> socketConnection("ubuntu", port = 10187)
>> 192.100.100.212:10187 cannot be opened
>>
>>
>> and with:
>>
>> socketConnection(port = 10187, server = TRUE)
>>
>> nothing happens, what is actually the expected output?
>
> the server call waits until a connection occurs and then returns an R
> connection object.  The clinet socketConnection call returns a socket
> connection if curresful and gives an error message if not.
>
> So on the master do
>
> s <- socketConnection(port = 10187, server = TRUE)
>
> and this will wait for a connection and return to the prompt when a
> connectin occurs.  On the wroker machine
>
> telnet master 10187
>
> will either succeed and wait until the server socket is closed or fail
> with an error message about not being able to open the port.  If I use
>
> nc master 10187
>
> then no an successful connection nc waits (for input) until the server
> closes the socket with close(s) and then returns to the shell prompt.
> Failure for me is an immediate resurn to the shell prompt, no error
> message (and the server side continues to wait).
>
> luke
>
>>
>> Thanks a lot for your help and advices!!!
>>
>> Mat
>>> -- 
>>> Steve Weston
>>> REvolution Computing
>>> One Century Tower | 265 Church Street, Suite 1006
>>> New Haven, CT  06510
>>> P: 203-777-7442 x266 | www.revolution-computing.com
>>>
>>>
>>> On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
>>> <matthieu.stigler at gmail.com> wrote:
>>>
>>>> luke at stat.uiowa.edu a écrit :
>>>>
>>>>> On Wed, 15 Apr 2009, Matthieu Stigler wrote:
>>>>>
>>>>>
>>>>>> Steve Weston a écrit :
>>>>>>
>>>>>>> On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
>>>>>>> <matthieu.stigler at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> So it is now working for the local computer with. However, when 
>>>>>>>> trying
>>>>>>>> to
>>>>>>>> use the external computer, it seems to be working but nothing 
>>>>>>>> happens
>>>>>>>> after
>>>>>>>> he asked for the last password...
>>>>>>>>
>>>>>>>>
>>>>>>> This tells you is that "something went wrong".  The basic 
>>>>>>> strategy in
>>>>>>> this case
>>>>>>> is to use the "outfile" option to hopefully capture an error 
>>>>>>> message.
>>>>>>>  You might
>>>>>>> need to set outfile differently for different slaves, 
>>>>>>> particularly if
>>>>>>> you're starting
>>>>>>> more than one on the same machine, but I suggest just starting 
>>>>>>> one slave
>>>>>>> on 210 to avoid the issue.  So do something like:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> host210 <- list(host = "mat at 192.100.100.210", rscript =
>>>>>>>> "/usr/bin/Rscript",
>>>>>>>>
>>>>>>>>
>>>>>>> +                       outfile="/tmp/log.txt")
>>>>>>>
>>>>>>>
>>>>>>>> cl2 <- makeCluster(list(host210), type = "SOCK")
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> Ok, thanks for pointing out this methid.
>>>>>>
>>>>>> I tried it and got following error message. This does not seem 
>>>>>> not be
>>>>>> computer specific (tried to do it to other host 213, and from 
>>>>>> other host 213
>>>>>> to 212, always same error message):
>>>>>>
>>>>>> starting worker for ubuntu:10187 Error in 
>>>>>> socketConnection(master, port =
>>>>>> port, blocking = TRUE, open = "a+b") : unable to open connection
>>>>>>
>>>>>> Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
>>>>>> socketConnection
>>>>>>
>>>>>> In addition: Warning message:
>>>>>>
>>>>>> In socketConnection(master, port = port, blocking = TRUE, open = 
>>>>>> "a+b") :
>>>>>>
>>>>>> ubuntu:10187 cannot be opened
>>>>>>
>>>>>> Execution halted
>>>>>>
>>>>>>
>>>>>> Is it related to ssh or snow? I did not find any reference to 
>>>>>> that prob
>>>>>> googling for it...
>>>>>>
>>>>> It is an issue with your ability to make a socket connection to the
>>>>> master. Most likely the master computer has a firewall that is
>>>>> blocking connections to the port snow uses.  Try turning the firewall
>>>>> off or at least enabling the port in the error message.
>>>>> A simple test is to do
>>>>>
>>>>>    socketConnection(port = 10187, server = TRUE)
>>>>>
>>>>> in an R session on the master and
>>>>>
>>>>>    telnet ubuntu 10187
>>>>>
>>>>> in a shell on your worker machine (assumign your master is called
>>>>> ubuntu) (or you can use R and
>>>>>
>>>>>    socketConnection("ubuntu", port = 10187)
>>>>>
>>>>> in an R session on the worker).
>>>>>
>>>>> luke
>>>>>
>>>>>
>>>> Thanks Luke and Dirk for your help!
>>>>
>>>> I don't think it is a firewall error, as both machines have all 
>>>> port open
>>>> (as default with iptables as I understood), and the admin of the 
>>>> network
>>>> opened even port 10187.
>>>>
>>>> I tried first the three solutions suggested, none of them seem to 
>>>> give good
>>>> results:
>>>>
>>>> $telnet 192.100.100.212 10187
>>>>
>>>> Trying 192.100.100.212...
>>>>
>>>> telnet: Unable to connect to remote host: Connection refused
>>>>
>>>> R>socketConnection(port = 10187, server=TRUE)
>>>>
>>>> #nothing happens... is it right?
>>>>
>>>>
>>>> R > socketConnection("192.100.100.212", port = 10187)
>>>> Erreur dans socketConnection("192.100.100.212", port = 10187) :
>>>>  impossible d'ouvrir la connexion
>>>>
>>>> De plus : Warning message:
>>>>
>>>> In socketConnection("192.100.100.212", port = 10187) :
>>>>
>>>>  192.100.100.212:10187 cannot be opened
>>>>
>>>> Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
>>>>
>>>> Going to a ubuntu forum, someone told that one has to open a server 
>>>> on the
>>>> port (excuse, explanations are not good as I don't understand that 
>>>> much the
>>>> subject :-( ).
>>>> So launching in the master (212):
>>>>
>>>> $nc -l -p 10187
>>>>
>>>>
>>>> then one is able to  have in 210:
>>>>
>>>> $telnet 192.100.100.212 10187
>>>>
>>>> Trying 192.100.100.212...
>>>>
>>>> Connected to 192.100.100.212.
>>>>
>>>> Escape character is '^]'.
>>>>
>>>> So it seems that it is working, but there is then no effect on the 
>>>> previous
>>>> commands socketConnection, makeCluster, still claims that 10187 
>>>> can't be
>>>> open.
>>>>
>>>> With those elements, do you guys see clearer or is it even darker? 
>>>> Thanks a
>>>> lot for your help!
>>>>
>>>> Matthieu
>>>>
>>>>
>>>>>> Thanks a lot for your help!!
>>>>>>
>>>>>>> If it hangs, go to another terminal, ssh to 192.100.100.210, and 
>>>>>>> look at
>>>>>>> the contents of /tmp/log.txt, and hopefully that will provide a 
>>>>>>> clue to
>>>>>>> the problem.
>>>>>>>
>>>>>>> Another approach is to use the "manual" option.  That will print 
>>>>>>> the
>>>>>>> command that you should use to manually start each of the slaves.
>>>>>>> You just ssh to that machine from another terminal, and cut and 
>>>>>>> paste
>>>>>>> the printed command to start the slave.  If you set "outfile" to an
>>>>>>> empty
>>>>>>> string, then output messages will go right to that terminal.
>>>>>>>
>>>>>>> -- 
>>>>>>> Steve Weston
>>>>>>> REvolution Computing
>>>>>>> One Century Tower | 265 Church Street, Suite 1006
>>>>>>> New Haven, CT  06510
>>>>>>> P: 203-777-7442 x266 | www.revolution-computing.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>>
>



More information about the R-sig-hpc mailing list