[Rd] read.table messes up stdin upon small,
erroneous input (PR#7722)
jtk at cmp.uea.ac.uk
jtk at cmp.uea.ac.uk
Fri Mar 11 14:49:08 CET 2005
Full_Name: Jan T. Kim
Version: 2.0.1, devel-2005-02-24
OS: Linux 2.6.x
Submission from: (NULL) (184.108.40.206)
Run read.table(stdin()) and type in the broken table
terminating the input by pressing Ctrl-D at the 3rd line of input. An error
message by scan, complaining that "line 2 did not have 2 elements" appears,
as expected. However: After this, there are three empty lines buffered in
 "" "" ""
Repeated attempts to read.table the broken input from stdin lead to even more
0: 1 2
2: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
line 2 did not have 2 elements
3: 1 2
 V1 V2
<0 rows> (or 0-length row.names)
Analysis: These effects are due to a combination of (1) the fact that
there appear to be various routes of accessing the standard input,
depending on context, and (2) the use of pushback in the process of
automatically figuring out the table format:
* read.table uses .Internal(readTableHead(...)) to get the first
nlines lines of the table (nlines = 5).
* .Internal(readTableHead(...)) always returns nlines lines, adding
empty lines if EOF comes before nlines lines are read.
* These lines, including any empty ones not originating from the
file in the first place, are then pushed back twice
* The first set of lines is always consumed off by the subsequent
code to figure out the number of columns.
* The second set is intended to be consumed by the regular operation
* However, if scan chokes before it can consume these lines, including
the blank ones, these will be left in the pushback buffer.
* R's interactive fetch-parse-evaluate loop does not use the connection
provided by stdin(), and therefore, the buffered stuff is not
noticed until the next attempt to read from the stdin connection.
The strange effects reported above could probably be fixed by modifying
the internal readTableHead function such that it does not produce emtpy
lines in order to return the number of lines "requested" by the nlines
A more fundamental approach would be to avoid pushing back lines
altogether. The repeated scanning of the first few lines could be
done by using a textConnection instead. Some additional work will
probably be necessary to combine the first few and the remaining
lines, acquired by regular operation of scan, into the complete
More information about the R-devel