[R] The behaviour of read.csv().
Rolf Turner
r.turner at auckland.ac.nz
Fri Dec 3 00:54:03 CET 2010
I have recently been bitten by an aspect of the behaviour of
the read.csv() function.
Some lines in a (fairly large) *.csv file that I read in had
too many entries. I would have hoped that this would cause
read.csv() to throw an error, or at least issue a warning,
but it read the file without complaint, putting the extra
entries into an additional line.
This behaviour is illustrated by the toy example in the
attached file ``junk.csv''. Just do
junk <- read.csv("junk.csv",header=TRUE)
junk
to see the problem.
If the offending over-long line were in the fourth line of data
or earlier, an error would be thrown, but if it is in the fifth line
of data or later no error is given.
This is in a way compatible with what the help on read.csv()
says:
The number of data columns is determined by looking at
the first five lines of input (or the whole file if it
has less than five lines), or from the length of col.names
if it is specified and is longer.
However, the help for read.table() says the same thing. And yet if
one does
gorp <- read.table("junk.csv",sep=",",header=TRUE)
one gets an error, whereas read.csv() gives none.
Am I correct in saying that is inappropriate behaviour on
the part of read.csv(), or am I missing something?
cheers,
Rolf Turner
-------------- next part --------------
P. S.:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/C/C/en_NZ.UTF-8/en_NZ.UTF-8
attached base packages:
[1] datasets utils stats graphics grDevices methods base
other attached packages:
[1] misc_0.0-13 gtools_2.6.2 spatstat_1.21-2 deldir_0.0-13
[5] mgcv_1.6-2 fortunes_1.4-0 MASS_7.3-8
loaded via a namespace (and not attached):
[1] grid_2.12.0 lattice_0.19-13 Matrix_0.999375-44 nlme_3.1-97
[5] tools_2.12.0
More information about the R-help
mailing list