[R] Stricter read.table?
bbolker at gmail.com
Sat Dec 11 00:06:27 CET 2010
Stavros Macrakis <macrakis <at> alum.mit.edu> writes:
> read.table gives idiosyncratic results when the input is formatted
> strangely, for example:
> => "c'd" "a'b" "c'd"
> => "f'" "\na" "b" "c'd" "f'" "\n"
> Though read.table doesn't specify the syntax of its input precisely, these
> results don't seem particularly useful or consistent.
> Is there a stricter version of read.table (perhaps in a package) that gives
> errors or warnings if it finds quotation marks in the middle of fields or
> encounters other such peculiar situations?
I dissected this behavior a bit more here
(it is due to an inconsistency between the way that scan() and
readLines() handle lines with unterminated quotes, IIRC)
and Martin Maechler said
"I think it can be defended to file as a bug, but it is tricky to pinpoint
exactly what the issue is."
I don't know of a stricter version of read.table(), but if you had
the time and inclination to pick through the code and (i) provide a
careful definition of desired behavior and (ii) supply patches, you could
do your little bit to make R better. (If I posted a bug report would you
annotate it with a discussion of desired behavior?)
More information about the R-help