[Rd] read.table() with quoted integers

David Winsemius dwinsemius at comcast.net
Tue Oct 1 18:29:07 CEST 2013


On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:

> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
>> Hi!
>> 
>> 
>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
>> quoted integers as an acceptable value for columns for which
>> colClasses="integer". But when colClasses is omitted, these columns are
>> read as integer anyway.
>> 
>> For example, let's consider a file named file.dat, containing:
>> "1"
>> "2"
>> 
>>> read.table("file.dat", colClasses="integer")
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>>  scan() expected 'an integer' and got '"1"'
>> 
>> But:
>>> str(read.table("file.dat"))
>> 'data.frame':   2 obs. of  1 variable:
>> $ V1: int  1 2
>> 
>> The latter result is indeed documented in ?read.table:
>>     Unless ‘colClasses’ is specified, all columns are read as
>>     character columns and then converted using ‘type.convert’ to
>>     logical, integer, numeric, complex or (depending on ‘as.is’)
>>     factor as appropriate.  Quotes are (by default) interpreted in all
>>     fields, so a column of values like ‘"42"’ will result in an
>>     integer column.
>> 
>> 
>> Should the former behavior be considered a bug?
>> 
> No. If you tell read.table the column is integer and it's actually
> character on disk, it should be an error.

My reading of the `read.table` help page is that one should expect that when there is an 'integer'-class and an  `as.integer` function and  "integer" is the argument to colClasses, that `as.integer` will be applied to the values in the column. Should I be reading elsewhere?

-- 
David.

> 
>> This creates problems when combined with read.table.ffdf from package
>> ff, since this function tries to guess the column classes by reading the
>> first rows of the file, and then passes colClasses to read.table to read
>> the remaining rows by chunks. A column of quoted integers is correctly
>> detected as integer in the first read, but read.table() fails in
>> subsequent reads.
>> 
> This sounds like a issue with read.table.ffdf.  The column of quoted
> integers is *incorrectly* detected as integer because they're actually
> character on disk.  read.table.ffdf should rely on how the data are
> actually stored on disk (via as.is=TRUE), not how read.table might
> convert them once they're read into R.
> 
>> 
>> Regards
>> 
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> --
> Joshua Ulrich  |  about.me/joshuaulrich
> FOSS Trading  |  www.fosstrading.com
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

David Winsemius
Alameda, CA, USA



More information about the R-devel mailing list