[R] socket problems (maybe bugs?)
Luke Tierney
luke at stat.uiowa.edu
Sat Feb 19 23:49:07 CET 2005
On Sat, 19 Feb 2005, Luke Tierney wrote:
> On Thu, 17 Feb 2005, Christian Lederer wrote:
>
>> Dear R Gurus,
>>
>> for some purpose i have to use a socket connection, where i have to read
>> and write both text and binary data (each binary data package will be
>> preceeded by a header line).
>> When experimenting, i encountered some problems (with R-2.0.1 under
>> different Linuxes (SuSE and Gentoo)).
>>
>> Since the default mode for socket connections is non-blocking,
>> i first tried socketSelect() in order to see whether the socket is ready
>> for reading:
>>
>> # Server:
>> s <- socketConnection(port=2222, server=TRUE, open="w+b")
>> writeLines("test", s)
>> writeBin(1:10, s, size=4, endian="big")
>>
>> # Client, variation 1:
>> s <- socketConnection(port=2222, server=FALSE, open="w+b")
>> socketSelect(list(s))
>> readLines(s, n=1) # works, "test" is read
>> socketSelect(list(s)) # does never return, although the server wrote 1:10
>>
>> (This seems to happen only, when i mix text and binary reads.)
>> However, without socketSelect(), R may crash if i try to read from an
>> empty socket:
>>
>> Server:
>> s <- socketConnection(port=2222, server=TRUE, open="w+b")
>> writeLines("test", s)
>> writeBin(1:10, s, size=4, endian="big")
>>
>> # Client, variation 2:
>> s <- socketConnection(port=2222, server=FALSE, open="w+b")
>> readLines(s, n=1) # works, "test" is read
>> readBin(s, "int", size=4, n=10, endian="big") # works, 1:10 is read
>> readBin(s, "int", size=4, n=10, endian="big") # second read leads to
>> # segmentation fault
>>
>> If i omit the endian="big" option, the second read does not crash, but
>> just gets 10 random numbers.
>>
>> On the first view, this does not seem to be a problem, since the
>> data will be preeceded by a header, which contains the number of
>> bytes in the binary block.
>> However, due to race conditions, i cannot exclude this situation:
>>
>> time server client
>> t0 sends header
>> t1 reads header
>> t2 tries to read binary, crashes
>> t3 sends binary
>>
>>
>> If i open the client socket in blocking mode, the second variation seems
>> to work (the second read just blocks as desired).
>> When using only one socket, i can do without socketSelect(), but
>> i have the follwoing questions:
>>
>> 1. Can i be sure, the the blocking variation will also work for larger
>> data sets, when e.g. the server starts writing before the client is
>> reading?
>>
>> 2. How could i proceed, if i needed several sockets?
>> Then i cannot use socketSelect due to the problem described in
>> variation 1.
>> I also cannot use blocking sockets, since reading from an empty socket
>> would block the others.
>> Without blocking and socketSelect(), i might run into the race condition
>> described above.
>>
>> In any case, the readBin() crash with endian="big" is a bug in
>> my eyes. For non-blocking sockets, readBin() should just return numeric(0),
>> if no data are written on the socket.
>> I also stronlgy suspect that the socketSelect() behaviour as described in
>> variation 1 is a bug.
>
> Thanks for the report and the examples. Both issues are bugs.
>
> The crash is due to the fact that a low level routine
> (sock_read_helper) correctly marks the connection as incomplete and
> returns -EAGAIN as its result but the next higher routine (sock_read)
> treats the result as a character count, unsigns it on return, and bad
> tings happen the third level up (do_readbin). I'm not quite sure
> whether the best fix is to change sock_read_helper to return 0 or to
> have sock_read to do some checking on the result it gets from
> sock_read.
>
> The issue with socketSelect is that socketSelect ought to return
> immediately if buffered input is available but it does not. As a
> result, when you execute both writes before the first read then the
> read will read all available input and store the part it does not use;
> socketSelect then waits for _additional_ input which never comes.
> This should be fixed in R-devel soon.
>
> I always use blocking reads and writes with sockets--its a lot easier
> than trying to figure out how to deal with incomplete reads or writes.
> You need to make sure to use a protocol that guarantees that a reader
> will read what a writer writes before the writer needs to move on. If
> you don't then you get deadlock with blocking writes and data large
> enough to fill the buffer. Using non-blocking sockets doesn't cure
> the problem, it just changes the symptoms.
>
> I use socketSelect in the cocket version of my snow package for the
> load balaned cluster apply to detect the first slave to finish its
> work. In my setup the final write/read pairs in each communication
> exchange are binary. With the current implementation this ensures
> that the read completely empties the buffer and so this problem does
> not bite. It sounds like the same stategy should allow you to work
> with the current implementation.
>
Both the socketSelect and the segfault in reading from non-blockign
sockets are now fixed in R-devel.
Best,
luke
--
Luke Tierney
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-help
mailing list