[R] The behaviour of read.csv().

Phil Spector spector at stat.berkeley.edu
Fri Dec 3 01:08:55 CET 2010

Rolf -
    I'd suggest using

     junk <- read.csv("junk.csv",header=TRUE,fill=FALSE)

if you don't want the behaviour you're seeing.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu

On Fri, 3 Dec 2010, Rolf Turner wrote:

> I have recently been bitten by an aspect of the behaviour of
> the read.csv() function.
> Some lines in a (fairly large) *.csv file that I read in had
> too many entries.  I would have hoped that this would cause
> read.csv() to throw an error, or at least issue a warning,
> but it read the file without complaint, putting the extra
> entries into an additional line.
> This behaviour is illustrated by the toy example in the
> attached file ``junk.csv''.  Just do
> 	junk <- read.csv("junk.csv",header=TRUE)
> 	junk
> to see the problem.
> If the offending over-long line were in the fourth line of data
> or earlier, an error would be thrown, but if it is in the fifth line
> of data or later no error is given.
> This is in a way compatible with what the help on read.csv()
> says:
> 	The number of data columns is determined by looking at
> 	the first five lines of input (or the whole file if it
> 	has less than five lines), or from the length of col.names
> 	if it is specified and is longer.
> However, the help for read.table() says the same thing.  And yet if
> one does
> 	gorp <- read.table("junk.csv",sep=",",header=TRUE)
> one gets an error, whereas read.csv() gives none.
> Am I correct in saying that is inappropriate behaviour on
> the part of read.csv(), or am I missing something?
> 		cheers,
> 			Rolf Turner

More information about the R-help mailing list