[R] The behaviour of read.csv().

Fri Dec 3 00:54:03 CET 2010

I have recently been bitten by an aspect of the behaviour of
the read.csv() function.

Some lines in a (fairly large) *.csv file that I read in had
too many entries.  I would have hoped that this would cause
read.csv() to throw an error, or at least issue a warning,
but it read the file without complaint, putting the extra
entries into an additional line.

This behaviour is illustrated by the toy example in the
attached file ``junk.csv''.  Just do

	junk <- read.csv("junk.csv",header=TRUE)
	junk

to see the problem.

If the offending over-long line were in the fourth line of data
or earlier, an error would be thrown, but if it is in the fifth line
of data or later no error is given.

This is in a way compatible with what the help on read.csv()
says:

	The number of data columns is determined by looking at
	the first five lines of input (or the whole file if it
	has less than five lines), or from the length of col.names
	if it is specified and is longer.

However, the help for read.table() says the same thing.  And yet if
one does

	gorp <- read.table("junk.csv",sep=",",header=TRUE)

one gets an error, whereas read.csv() gives none.

Am I correct in saying that is inappropriate behaviour on
the part of read.csv(), or am I missing something?

		cheers,

			Rolf Turner

-------------- next part --------------

P. S.:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_NZ.UTF-8/en_NZ.UTF-8/C/C/en_NZ.UTF-8/en_NZ.UTF-8

attached base packages:
[1] datasets  utils     stats     graphics  grDevices methods   base     

other attached packages:
[1] misc_0.0-13     gtools_2.6.2    spatstat_1.21-2 deldir_0.0-13  
[5] mgcv_1.6-2      fortunes_1.4-0  MASS_7.3-8     

loaded via a namespace (and not attached):
[1] grid_2.12.0        lattice_0.19-13    Matrix_0.999375-44 nlme_3.1-97       
[5] tools_2.12.0