[R] The behaviour of read.csv().
Phil Spector
spector at stat.berkeley.edu
Fri Dec 3 01:08:55 CET 2010
Rolf -
I'd suggest using
junk <- read.csv("junk.csv",header=TRUE,fill=FALSE)
if you don't want the behaviour you're seeing.
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Fri, 3 Dec 2010, Rolf Turner wrote:
>
> I have recently been bitten by an aspect of the behaviour of
> the read.csv() function.
>
> Some lines in a (fairly large) *.csv file that I read in had
> too many entries. I would have hoped that this would cause
> read.csv() to throw an error, or at least issue a warning,
> but it read the file without complaint, putting the extra
> entries into an additional line.
>
> This behaviour is illustrated by the toy example in the
> attached file ``junk.csv''. Just do
>
> junk <- read.csv("junk.csv",header=TRUE)
> junk
>
> to see the problem.
>
> If the offending over-long line were in the fourth line of data
> or earlier, an error would be thrown, but if it is in the fifth line
> of data or later no error is given.
>
> This is in a way compatible with what the help on read.csv()
> says:
>
> The number of data columns is determined by looking at
> the first five lines of input (or the whole file if it
> has less than five lines), or from the length of col.names
> if it is specified and is longer.
>
> However, the help for read.table() says the same thing. And yet if
> one does
>
> gorp <- read.table("junk.csv",sep=",",header=TRUE)
>
> one gets an error, whereas read.csv() gives none.
>
> Am I correct in saying that is inappropriate behaviour on
> the part of read.csv(), or am I missing something?
>
> cheers,
>
> Rolf Turner
>
>
More information about the R-help
mailing list