[Rd] read.table() with quoted integers

Duncan Murdoch murdoch.duncan at gmail.com
Fri Oct 4 13:55:43 CEST 2013


On 13-10-04 7:31 AM, Joshua Ulrich wrote:
> On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:
>>
>>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
>>>> Hi!
>>>>
>>>>
>>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
>>>> quoted integers as an acceptable value for columns for which
>>>> colClasses="integer". But when colClasses is omitted, these columns are
>>>> read as integer anyway.
>>>>
>>>> For example, let's consider a file named file.dat, containing:
>>>> "1"
>>>> "2"
>>>>
>>>>> read.table("file.dat", colClasses="integer")
>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>>>>   scan() expected 'an integer' and got '"1"'
>>>>
>>>> But:
>>>>> str(read.table("file.dat"))
>>>> 'data.frame':   2 obs. of  1 variable:
>>>> $ V1: int  1 2
>>>>
>>>> The latter result is indeed documented in ?read.table:
>>>>      Unless ‘colClasses’ is specified, all columns are read as
>>>>      character columns and then converted using ‘type.convert’ to
>>>>      logical, integer, numeric, complex or (depending on ‘as.is’)
>>>>      factor as appropriate.  Quotes are (by default) interpreted in all
>>>>      fields, so a column of values like ‘"42"’ will result in an
>>>>      integer column.
>>>>
>>>>
>>>> Should the former behavior be considered a bug?
>>>>
>>> No. If you tell read.table the column is integer and it's actually
>>> character on disk, it should be an error.
>>
>> My reading of the `read.table` help page is that one should expect that when
>> there is an 'integer'-class and an  `as.integer` function and  "integer" is the
>> argument to colClasses, that `as.integer` will be applied to the values in the
>> column. Should I be reading elsewhere?
>>
> I assume you're referring to the paragraph below.
>
>    Possible values are ‘NA’ (the default, when ‘type.convert’ is
>    used), ‘"NULL"’ (when the column is skipped), one of the
>    atomic vector classes (logical, integer, numeric, complex,
>    character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’.
>    Otherwise there needs to be an ‘as’ method (from package
>    ‘methods’) for conversion from ‘"character"’ to the specified
>    formal class.
>
> I read that as meaning that an "as" method is required for classes not
> already listed in the prior sentence.  It doesn't say an "as" method
> will be applied if colClasses is one of the atomic, factor, Date, or
> POSIXct classes; but I can see how you might assume that, since all
> the atomic, factor, Date, and POSIXct classes already have "as"
> methods...

And this does suggest a workaround for ffdf:  instead of declaring the 
class to be "integer", declare a class "ffdf_integer", and write a 
conversion method.  Or simply read everything as character and call 
as.integer() explicitly.

Duncan Murdoch



More information about the R-devel mailing list