[R-sig-hpc] snow, socket cluster: problem with path to rscript
Matthieu Stigler
matthieu.stigler at gmail.com
Fri Apr 17 16:40:43 CEST 2009
Steve Weston a écrit :
> I just noticed that you're running R 2.7.1 on your 192.100.100.212
> machine. I believe there are known socketConnection issues
> with that version of R that Luke fixed as of R 2.7.2. So I strongly
> suggest that you upgrade your version of R.
>
I upgraded to R 2.8 but unfortunately this doesn't change, the port
10187 is still said to be close...
I obviously have a problem in opening the port, maybe should I rather
post on the debian list or on other forums? I use nc -l -p 10187, so
that telnet xxx.212 10187 is working, did it on both machines, but still
when running with makeCluster have that issue, also when running from
worker:
socketConnection("ubuntu", port = 10187)
192.100.100.212:10187 cannot be opened
and with:
socketConnection(port = 10187, server = TRUE)
nothing happens, what is actually the expected output?
Thanks a lot for your help and advices!!!
Mat
> --
> Steve Weston
> REvolution Computing
> One Century Tower | 265 Church Street, Suite 1006
> New Haven, CT 06510
> P: 203-777-7442 x266 | www.revolution-computing.com
>
>
> On Thu, Apr 16, 2009 at 4:52 AM, Matthieu Stigler
> <matthieu.stigler at gmail.com> wrote:
>
>> luke at stat.uiowa.edu a écrit :
>>
>>> On Wed, 15 Apr 2009, Matthieu Stigler wrote:
>>>
>>>
>>>> Steve Weston a écrit :
>>>>
>>>>> On Tue, Apr 14, 2009 at 5:29 AM, Matthieu Stigler
>>>>> <matthieu.stigler at gmail.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> So it is now working for the local computer with. However, when trying
>>>>>> to
>>>>>> use the external computer, it seems to be working but nothing happens
>>>>>> after
>>>>>> he asked for the last password...
>>>>>>
>>>>>>
>>>>> This tells you is that "something went wrong". The basic strategy in
>>>>> this case
>>>>> is to use the "outfile" option to hopefully capture an error message.
>>>>> You might
>>>>> need to set outfile differently for different slaves, particularly if
>>>>> you're starting
>>>>> more than one on the same machine, but I suggest just starting one slave
>>>>> on 210 to avoid the issue. So do something like:
>>>>>
>>>>>
>>>>>
>>>>>> host210 <- list(host = "mat at 192.100.100.210", rscript =
>>>>>> "/usr/bin/Rscript",
>>>>>>
>>>>>>
>>>>> + outfile="/tmp/log.txt")
>>>>>
>>>>>
>>>>>> cl2 <- makeCluster(list(host210), type = "SOCK")
>>>>>>
>>>>>>
>>>>>
>>>> Ok, thanks for pointing out this methid.
>>>>
>>>> I tried it and got following error message. This does not seem not be
>>>> computer specific (tried to do it to other host 213, and from other host 213
>>>> to 212, always same error message):
>>>>
>>>> starting worker for ubuntu:10187 Error in socketConnection(master, port =
>>>> port, blocking = TRUE, open = "a+b") : unable to open connection
>>>>
>>>> Calls: local ... slaveLoop -> recvData -> makeSOCKmaster ->
>>>> socketConnection
>>>>
>>>> In addition: Warning message:
>>>>
>>>> In socketConnection(master, port = port, blocking = TRUE, open = "a+b") :
>>>>
>>>> ubuntu:10187 cannot be opened
>>>>
>>>> Execution halted
>>>>
>>>>
>>>> Is it related to ssh or snow? I did not find any reference to that prob
>>>> googling for it...
>>>>
>>> It is an issue with your ability to make a socket connection to the
>>> master. Most likely the master computer has a firewall that is
>>> blocking connections to the port snow uses. Try turning the firewall
>>> off or at least enabling the port in the error message.
>>> A simple test is to do
>>>
>>> socketConnection(port = 10187, server = TRUE)
>>>
>>> in an R session on the master and
>>>
>>> telnet ubuntu 10187
>>>
>>> in a shell on your worker machine (assumign your master is called
>>> ubuntu) (or you can use R and
>>>
>>> socketConnection("ubuntu", port = 10187)
>>>
>>> in an R session on the worker).
>>>
>>> luke
>>>
>>>
>> Thanks Luke and Dirk for your help!
>>
>> I don't think it is a firewall error, as both machines have all port open
>> (as default with iptables as I understood), and the admin of the network
>> opened even port 10187.
>>
>> I tried first the three solutions suggested, none of them seem to give good
>> results:
>>
>> $telnet 192.100.100.212 10187
>>
>> Trying 192.100.100.212...
>>
>> telnet: Unable to connect to remote host: Connection refused
>>
>> R>socketConnection(port = 10187, server=TRUE)
>>
>> #nothing happens... is it right?
>>
>>
>> R > socketConnection("192.100.100.212", port = 10187)
>> Erreur dans socketConnection("192.100.100.212", port = 10187) :
>> impossible d'ouvrir la connexion
>>
>> De plus : Warning message:
>>
>> In socketConnection("192.100.100.212", port = 10187) :
>>
>> 192.100.100.212:10187 cannot be opened
>>
>> Same error message when using "ubuntu"/ dsge at 192.100.100.212 etc..
>>
>> Going to a ubuntu forum, someone told that one has to open a server on the
>> port (excuse, explanations are not good as I don't understand that much the
>> subject :-( ).
>> So launching in the master (212):
>>
>> $nc -l -p 10187
>>
>>
>> then one is able to have in 210:
>>
>> $telnet 192.100.100.212 10187
>>
>> Trying 192.100.100.212...
>>
>> Connected to 192.100.100.212.
>>
>> Escape character is '^]'.
>>
>> So it seems that it is working, but there is then no effect on the previous
>> commands socketConnection, makeCluster, still claims that 10187 can't be
>> open.
>>
>> With those elements, do you guys see clearer or is it even darker? Thanks a
>> lot for your help!
>>
>> Matthieu
>>
>>
>>>> Thanks a lot for your help!!
>>>>
>>>>> If it hangs, go to another terminal, ssh to 192.100.100.210, and look at
>>>>> the contents of /tmp/log.txt, and hopefully that will provide a clue to
>>>>> the problem.
>>>>>
>>>>> Another approach is to use the "manual" option. That will print the
>>>>> command that you should use to manually start each of the slaves.
>>>>> You just ssh to that machine from another terminal, and cut and paste
>>>>> the printed command to start the slave. If you set "outfile" to an
>>>>> empty
>>>>> string, then output messages will go right to that terminal.
>>>>>
>>>>> --
>>>>> Steve Weston
>>>>> REvolution Computing
>>>>> One Century Tower | 265 Church Street, Suite 1006
>>>>> New Haven, CT 06510
>>>>> P: 203-777-7442 x266 | www.revolution-computing.com
>>>>>
>>>>>
>>>>
>>
More information about the R-sig-hpc
mailing list