[Rd] Inconsistency, may be bug in read.delim ?
Detlef Steuer
steuer at hsu-hh.de
Mon Mar 19 14:23:32 CET 2018
Dear friends,
I stumbled into beheaviour of read.delim which I would consider a bug
or at least an inconsistency that should be improved upon.
Recently we had to work with data that used "", two double quotes, as
symbol to start and end character input.
Essentially the data looked like this
data.csv
========
V1, V2, V3
""data"", 3, """"
The last sequence of """" indicating a missing.
One obvious solution to read in this data is using some gsub(),
but that's not the point I want to make.
Consider this case we found during tests:
test.csv
========
V1, V2, V3, V4
"""", """", 3, ""
and read it with
> read.delim("test.csv", sep=",", header=TRUE, na.strings="\"")
you get the following
V1 V2 V3 V4
1 NA " 3 NA
(and a warning)
I would have assumed to get some error message or at
least the same result for both appearances of """" in the
input file.
(the setting na.strings="\"" turned out to be working for
a colleague and his specific data, while I think it shouldn't)
My main concern is the different interpretation for the two """"
sequences.
Real bug? Minor inconsistency? I don't know.
All the best
Detlef
--
'People who say "I have nothing to hide" misunderstand the purpose of
surveillance. It was never about privacy. It's about power.' E. Snowden
More information about the R-devel
mailing list