[Rd] Bug in read.table?

Ben Bolker bbolker at gmail.com
Tue Nov 16 02:59:26 CET 2010


Ben Bolker <bbolker <at> gmail.com> writes:

> 
> Ben Bolker <bbolker <at> gmail.com> writes:
> 
> >
> >
> 
>    Can simplify this still farther:
> 
> a b'c
> d e'f
> g h'i

  This example file leads to duplicate lines.
Arguably it should have behavior analogous to:

> scan(what="")
1: a b'c
3: d e'f
5: g h'i
7: Read 6 items
[1] "a"   "b'c" "d"   "e'f" "g"   "h'i"


> 
> >  One of the first things that happens in read.table is that
> > the first few lines are read with readTableHead:
> > 
> >   lines <- .Internal(readTableHead(file, nlines, comment.char, 
> >        blank.lines.skip, quote, sep))
> > 
>   in this case, this reads the first two lines as one line;
> the single quote at pos. 4 of the first line closes on pos.
> 4 of the second line, preventing the first new line from
> ending a line.
> 
>   R then pushes back two copies of the lines that have
> been read (this is normal behavior; I don't quite follow the
> logic).
> 
>   The rest of the file is read with scan(), 1 line at a time.
> However, there is the discrepancy between the way
> that readTableHead interprets new lines in the middle of
> quoted strings (it ignores them) and the way that scan()
> interprets them (it takes them as the end of the quoted string).


  Ping?
  I think this counts as a small, but real, bug. Should I go ahead
and report it as such, or would someone explain why it's not a bug?

  cheers
    Ben Bolker



More information about the R-devel mailing list