[Rd] read.table() with quoted integers

Milan Bouchet-Valat nalimilan at club.fr
Mon Sep 30 14:33:23 CEST 2013


Hi!


It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
quoted integers as an acceptable value for columns for which
colClasses="integer". But when colClasses is omitted, these columns are
read as integer anyway.

For example, let's consider a file named file.dat, containing:
"1"
"2"

> read.table("file.dat", colClasses="integer")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : 
  scan() expected 'an integer' and got '"1"'

But:
> str(read.table("file.dat"))
'data.frame':	2 obs. of  1 variable:
 $ V1: int  1 2

The latter result is indeed documented in ?read.table:
     Unless ‘colClasses’ is specified, all columns are read as
     character columns and then converted using ‘type.convert’ to
     logical, integer, numeric, complex or (depending on ‘as.is’)
     factor as appropriate.  Quotes are (by default) interpreted in all
     fields, so a column of values like ‘"42"’ will result in an
     integer column.


Should the former behavior be considered a bug?

This creates problems when combined with read.table.ffdf from package
ff, since this function tries to guess the column classes by reading the
first rows of the file, and then passes colClasses to read.table to read
the remaining rows by chunks. A column of quoted integers is correctly
detected as integer in the first read, but read.table() fails in
subsequent reads.


Regards



More information about the R-devel mailing list