[Rd] Bug in read.table?
peter dalgaard
pdalgd at gmail.com
Tue Nov 16 14:04:16 CET 2010
On Nov 16, 2010, at 02:59 , Ben Bolker wrote:
> Ben Bolker <bbolker <at> gmail.com> writes:
>
>>
>> Ben Bolker <bbolker <at> gmail.com> writes:
>>
>>>
>>>
>>
>> Can simplify this still farther:
>>
>> a b'c
>> d e'f
>> g h'i
>
> This example file leads to duplicate lines.
> Arguably it should have behavior analogous to:
>
>> scan(what="")
> 1: a b'c
> 3: d e'f
> 5: g h'i
> 7: Read 6 items
> [1] "a" "b'c" "d" "e'f" "g" "h'i"
>
>
>>
>>> One of the first things that happens in read.table is that
>>> the first few lines are read with readTableHead:
>>>
>>> lines <- .Internal(readTableHead(file, nlines, comment.char,
>>> blank.lines.skip, quote, sep))
>>>
>> in this case, this reads the first two lines as one line;
>> the single quote at pos. 4 of the first line closes on pos.
>> 4 of the second line, preventing the first new line from
>> ending a line.
>>
>> R then pushes back two copies of the lines that have
>> been read (this is normal behavior; I don't quite follow the
>> logic).
>>
>> The rest of the file is read with scan(), 1 line at a time.
>> However, there is the discrepancy between the way
>> that readTableHead interprets new lines in the middle of
>> quoted strings (it ignores them) and the way that scan()
>> interprets them (it takes them as the end of the quoted string).
>
>
> Ping?
> I think this counts as a small, but real, bug. Should I go ahead
> and report it as such, or would someone explain why it's not a bug?
>
I think it can be defended to file as a bug, but it is tricky to pinpoint exactly what the issue is. E.g., notice that adding a few spaces changes the behaviour of scan() considerably:
> scan(what="")
1: a b 'c
1: d e' f
5: g h' i
8:
Read 7 items
[1] "a" "b" "c\nd e" "f" "g" "h'" "i"
(I'm confused... What is it that we really want here?)
Also, as you noted originally, beware the "Well don't do that then" aspect...
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-devel
mailing list