[R-sig-hpc] snow, socket cluster: problem with path to rscript

Steve Weston steve at revolution-computing.com
Thu Apr 16 21:06:27 CEST 2009


I just noticed that you're running R 2.7.1 on your 192.100.100.212
machine.  I believe there are known socketConnection issues
with that version of R that Luke fixed as of R 2.7.2.  So I strongly
suggest that you upgrade your version of R.

--
Steve Weston
REvolution Computing
One Century Tower | 265 Church Street, Suite 1006
New Haven, CT  06510
P: 203-777-7442 x266 | www.revolution-computing.com


On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
<matthieu.stigler at gmail.com> wrote:
> luke at stat.uiowa.edu a écrit :
>>
>> On Wed, 15 Apr 2009, Matthieu Stigler wrote:
>>
>>> Steve Weston a écrit :
>>>>
>>>> On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
>>>> <matthieu.stigler at gmail.com> wrote:
>>>>
>>>>
>>>>> So it is now working for the local computer with. However, when trying
>>>>> to
>>>>> use the external computer, it seems to be working but nothing happens
>>>>> after
>>>>> he asked for the last password...
>>>>>
>>>>
>>>> This tells you is that "something went wrong".  The basic strategy in
>>>> this case
>>>> is to use the "outfile" option to hopefully capture an error message.
>>>>  You might
>>>> need to set outfile differently for different slaves, particularly if
>>>> you're starting
>>>> more than one on the same machine, but I suggest just starting one slave
>>>> on 210 to avoid the issue.  So do something like:
>>>>
>>>>
>>>>> host210 <- list(host = "mat at 192.100.100.210", rscript =
>>>>> "/usr/bin/Rscript",
>>>>>
>>>> +                       outfile="/tmp/log.txt")
>>>>
>>>>> cl2 <- makeCluster(list(host210), type = "SOCK")
>>>>>
>>>>
>>>>
>>> Ok, thanks for pointing out this methid.
>>>
>>> I tried it and got following error message. This does not seem not be
>>> computer specific (tried to do it to other host 213, and from other host 213
>>> to 212, always same error message):
>>>
>>> starting worker for ubuntu:10187 Error in socketConnection(master, port =
>>> port, blocking = TRUE, open = "a+b") : unable to open connection
>>>
>>> Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
>>> socketConnection
>>>
>>> In addition: Warning message:
>>>
>>> In socketConnection(master, port = port, blocking = TRUE, open = "a+b") :
>>>
>>> ubuntu:10187 cannot be opened
>>>
>>> Execution halted
>>>
>>>
>>> Is it related to ssh or snow? I did not find any reference to that prob
>>> googling for it...
>>
>> It is an issue with your ability to make a socket connection to the
>> master. Most likely the master computer has a firewall that is
>> blocking connections to the port snow uses.  Try turning the firewall
>> off or at least enabling the port in the error message.
>> A simple test is to do
>>
>>    socketConnection(port = 10187, server = TRUE)
>>
>> in an R session on the master and
>>
>>    telnet ubuntu 10187
>>
>> in a shell on your worker machine (assumign your master is called
>> ubuntu) (or you can use R and
>>
>>    socketConnection("ubuntu", port = 10187)
>>
>> in an R session on the worker).
>>
>> luke
>>
>
> Thanks Luke and Dirk for your help!
>
> I don't think it is a firewall error, as both machines have all port open
> (as default with iptables as I understood), and the admin of the network
> opened even port 10187.
>
> I tried first the three solutions suggested, none of them seem to give good
> results:
>
> $telnet 192.100.100.212 10187
>
> Trying 192.100.100.212...
>
> telnet: Unable to connect to remote host: Connection refused
>
> R>socketConnection(port = 10187, server=TRUE)
>
> #nothing happens... is it right?
>
>
> R > socketConnection("192.100.100.212", port = 10187)
> Erreur dans socketConnection("192.100.100.212", port = 10187) :
>  impossible d'ouvrir la connexion
>
> De plus : Warning message:
>
> In socketConnection("192.100.100.212", port = 10187) :
>
>  192.100.100.212:10187 cannot be opened
>
> Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
>
> Going to a ubuntu forum, someone told that one has to open a server on the
> port (excuse, explanations are not good as I don't understand that much the
> subject :-( ).
> So launching in the master (212):
>
> $nc -l -p 10187
>
>
> then one is able to  have in 210:
>
> $telnet 192.100.100.212 10187
>
> Trying 192.100.100.212...
>
> Connected to 192.100.100.212.
>
> Escape character is '^]'.
>
> So it seems that it is working, but there is then no effect on the previous
> commands socketConnection, makeCluster, still claims that 10187 can't be
> open.
>
> With those elements, do you guys see clearer or is it even darker? Thanks a
> lot for your help!
>
> Matthieu
>
>>>
>>> Thanks a lot for your help!!
>>>>
>>>> If it hangs, go to another terminal, ssh to 192.100.100.210, and look at
>>>> the contents of /tmp/log.txt, and hopefully that will provide a clue to
>>>> the problem.
>>>>
>>>> Another approach is to use the "manual" option.  That will print the
>>>> command that you should use to manually start each of the slaves.
>>>> You just ssh to that machine from another terminal, and cut and paste
>>>> the printed command to start the slave.  If you set "outfile" to an
>>>> empty
>>>> string, then output messages will go right to that terminal.
>>>>
>>>> --
>>>> Steve Weston
>>>> REvolution Computing
>>>> One Century Tower | 265 Church Street, Suite 1006
>>>> New Haven, CT  06510
>>>> P: 203-777-7442 x266 | www.revolution-computing.com
>>>>
>>>
>>>
>>
>
>



More information about the R-sig-hpc mailing list