[Rd] PATCH: Asserting that 'connection' used has not changed + R_GetConnection2()
Henrik Bengtsson
henrik@bengt@@on @ending from gm@il@com
Wed Oct 31 03:51:49 CET 2018
SUMMARY:
I'm proposing that R assert that 'connection' options have not changed
since first created such that R will produce the following error:
> fh <- file("a.txt", open = "w+")
> cat("hello\n", file = fh)
> close(fh)
> fh2 <- file("b.txt", open = "w+")
> cat("world\n", file = fh2)
> cat("hello again\n", file = fh)
Error in cat("hello again\n", file = fh) :
invalid connection (non-existing 'conn_id')
Note that, currently in R, the latter silently writes to 'b.txt' - not
'a.txt' (for more details, see
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81).
BACKGROUND:
In R, connections are indexed by their (zero-based) row indices in the
table of available connections. For example,
> fh <- file("a.txt", open = "w")
> showConnections(all = TRUE)
description class mode text isopen can read can write
0 "stdin" "terminal" "r" "text" "opened" "yes" "no"
1 "stdout" "terminal" "w" "text" "opened" "no" "yes"
2 "stderr" "terminal" "w" "text" "opened" "no" "yes"
3 "a.txt" "file" "w" "text" "opened" "no" "yes"
> con <- getConnection(3)
> identical(con, fh)
[1] TRUE
ISSUE:
The problem with the current design/implementation where connections
are referred to by their index (only), is that
(i) the table of connections changes over time and
(ii) connection indices are recycled.
Because a `connection` object holds the connection row index, it means
that *the actual underlying connection that a `connection` object
refers to may change over its lifetime*.
SUGGESTION:
Make use of the 'Rconn' struct field 'id', which is unique, to assert
that the 'connection' object used is referring to the
original/expected connection. The 'id' field is available via
attribute 'conn_id' part of a 'connection' object.
PATCH:
See attached 'connection.patch' file (or
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81#issuecomment-434210222).
The patch introduces a new SEXP R_GetConnection2(SEXP sConn) function,
which looks up a connection by its index *and* the 'id' field. This
function is backward compatible with R_GetConnection(), which looks up
a connection by its index (only). In addition, R_GetConnection2() also
accepts 'sConn' of type integer, which the looks up the connection
similar to how the internal getConnection() function does it.
Comment: The patch is just one of many alternatives. Hopefully, it
helps clarify what I'm suggesting. It passes 'make check' and I've
tested it on a few packages of mine that make heavy use of different
types of connections.
In addition to "overridden" connections, the patch protects against
invalid 'connection':s that have been serialized, e.g.
> fh2 <- file("b.txt", open = "w+")
> saveRDS(fh2, file = "fh2.rds")
> fh3 <- readRDS("fh2.rds")
> attr(fh2, "conn_id")
<pointer: 0x78>
> attr(fh3, "conn_id")
<pointer: (nil)> #<== NIL because external pointer was lost when serialized
> isOpen(fh2)
[1] TRUE
> isOpen(fh3)
Error in isOpen(fh3) : invalid connection ('conn_id' is NULL)
This is useful, when for instance 'connection':s are (incorrectly)
passed to background R sessions (e.g. PSOCK cluster nodes).
SEE ALSO:
* More details of the above are scribbled down on
https://github.com/HenrikBengtsson/Wishlist-for-R/issues/81
* R-devel post 'closeAllConnections() can really mess things up',
2016-10-30, https://stat.ethz.ch/pipermail/r-devel/2016-October/073331.html
All the best,
Henrik
-------------- next part --------------
A non-text attachment was scrubbed...
Name: connections.patch
Type: text/x-patch
Size: 15539 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20181030/d2a946b2/attachment.bin>
More information about the R-devel
mailing list