[R] Stricter read.table?

Ben Bolker bbolker at gmail.com
Sat Dec 11 00:06:27 CET 2010


Stavros Macrakis <macrakis <at> alum.mit.edu> writes:

> 
> read.table gives idiosyncratic results when the input is formatted
> strangely, for example:
> 
> read.table(textConnection(
"a'b\nc'd\n"),header=FALSE,
  fill=TRUE,sep="",quote="'")
>   => "c'd" "a'b" "c'd"
> 
>
> read.table(textConnection(
"a'b\nc'd\nf'\n'\n"),
header=FALSE,fill=TRUE
sep="",quote="'")
>   => "f'"  "\na" "b"   "c'd" "f'"  "\n"
> 
> Though read.table doesn't specify the syntax of its input precisely, these
> results don't seem particularly useful or consistent.
> 
> Is there a stricter version of read.table (perhaps in a package) that gives
> errors or warnings if it finds quotation marks in the middle of fields or
> encounters other such peculiar situations?

  I dissected this behavior a bit more here

<https://stat.ethz.ch/pipermail/r-devel/2010-November/059016.html>

(it is due to an inconsistency between the way that scan() and
readLines() handle lines with unterminated quotes, IIRC)

and Martin Maechler said
<https://stat.ethz.ch/pipermail/r-devel/2010-November/059107.html>
"I think it can be defended to file as a bug, but it is tricky to pinpoint
exactly what the issue is."
   I don't know of a stricter version of read.table(), but if you had
the time and inclination to pick through the code and (i) provide a
careful definition of desired behavior and (ii) supply patches, you could
do your little bit to make R better. (If I posted a bug report would you
annotate it with a discussion of desired behavior?)



More information about the R-help mailing list