[Rd] read.table() with quoted integers
Milan Bouchet-Valat
nalimilan at club.fr
Mon Sep 30 14:33:23 CEST 2013
Hi!
It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
quoted integers as an acceptable value for columns for which
colClasses="integer". But when colClasses is omitted, these columns are
read as integer anyway.
For example, let's consider a file named file.dat, containing:
"1"
"2"
> read.table("file.dat", colClasses="integer")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer' and got '"1"'
But:
> str(read.table("file.dat"))
'data.frame': 2 obs. of 1 variable:
$ V1: int 1 2
The latter result is indeed documented in ?read.table:
Unless ‘colClasses’ is specified, all columns are read as
character columns and then converted using ‘type.convert’ to
logical, integer, numeric, complex or (depending on ‘as.is’)
factor as appropriate. Quotes are (by default) interpreted in all
fields, so a column of values like ‘"42"’ will result in an
integer column.
Should the former behavior be considered a bug?
This creates problems when combined with read.table.ffdf from package
ff, since this function tries to guess the column classes by reading the
first rows of the file, and then passes colClasses to read.table to read
the remaining rows by chunks. A column of quoted integers is correctly
detected as integer in the first read, but read.table() fails in
subsequent reads.
Regards
More information about the R-devel
mailing list