[Rd] read.table() with quoted integers

Joshua Ulrich josh.m.ulrich at gmail.com
Mon Sep 30 15:38:00 CEST 2013


On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> Hi!
>
>
> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
> quoted integers as an acceptable value for columns for which
> colClasses="integer". But when colClasses is omitted, these columns are
> read as integer anyway.
>
> For example, let's consider a file named file.dat, containing:
> "1"
> "2"
>
>> read.table("file.dat", colClasses="integer")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>   scan() expected 'an integer' and got '"1"'
>
> But:
>> str(read.table("file.dat"))
> 'data.frame':   2 obs. of  1 variable:
>  $ V1: int  1 2
>
> The latter result is indeed documented in ?read.table:
>      Unless ‘colClasses’ is specified, all columns are read as
>      character columns and then converted using ‘type.convert’ to
>      logical, integer, numeric, complex or (depending on ‘as.is’)
>      factor as appropriate.  Quotes are (by default) interpreted in all
>      fields, so a column of values like ‘"42"’ will result in an
>      integer column.
>
>
> Should the former behavior be considered a bug?
>
No. If you tell read.table the column is integer and it's actually
character on disk, it should be an error.

> This creates problems when combined with read.table.ffdf from package
> ff, since this function tries to guess the column classes by reading the
> first rows of the file, and then passes colClasses to read.table to read
> the remaining rows by chunks. A column of quoted integers is correctly
> detected as integer in the first read, but read.table() fails in
> subsequent reads.
>
This sounds like a issue with read.table.ffdf.  The column of quoted
integers is *incorrectly* detected as integer because they're actually
character on disk.  read.table.ffdf should rely on how the data are
actually stored on disk (via as.is=TRUE), not how read.table might
convert them once they're read into R.

>
> Regards
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com



More information about the R-devel mailing list