[Rd] Bug in read.table?
Ben Bolker
bbolker at gmail.com
Tue Nov 16 02:59:26 CET 2010
Ben Bolker <bbolker <at> gmail.com> writes:
>
> Ben Bolker <bbolker <at> gmail.com> writes:
>
> >
> >
>
> Can simplify this still farther:
>
> a b'c
> d e'f
> g h'i
This example file leads to duplicate lines.
Arguably it should have behavior analogous to:
> scan(what="")
1: a b'c
3: d e'f
5: g h'i
7: Read 6 items
[1] "a" "b'c" "d" "e'f" "g" "h'i"
>
> > One of the first things that happens in read.table is that
> > the first few lines are read with readTableHead:
> >
> > lines <- .Internal(readTableHead(file, nlines, comment.char,
> > blank.lines.skip, quote, sep))
> >
> in this case, this reads the first two lines as one line;
> the single quote at pos. 4 of the first line closes on pos.
> 4 of the second line, preventing the first new line from
> ending a line.
>
> R then pushes back two copies of the lines that have
> been read (this is normal behavior; I don't quite follow the
> logic).
>
> The rest of the file is read with scan(), 1 line at a time.
> However, there is the discrepancy between the way
> that readTableHead interprets new lines in the middle of
> quoted strings (it ignores them) and the way that scan()
> interprets them (it takes them as the end of the quoted string).
Ping?
I think this counts as a small, but real, bug. Should I go ahead
and report it as such, or would someone explain why it's not a bug?
cheers
Ben Bolker
More information about the R-devel
mailing list