[Rd] read.table() with quoted integers

Joshua Ulrich josh.m.ulrich at gmail.com
Fri Oct 4 13:31:49 CEST 2013


On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:
>
>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
>>> Hi!
>>>
>>>
>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
>>> quoted integers as an acceptable value for columns for which
>>> colClasses="integer". But when colClasses is omitted, these columns are
>>> read as integer anyway.
>>>
>>> For example, let's consider a file named file.dat, containing:
>>> "1"
>>> "2"
>>>
>>>> read.table("file.dat", colClasses="integer")
>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>>>  scan() expected 'an integer' and got '"1"'
>>>
>>> But:
>>>> str(read.table("file.dat"))
>>> 'data.frame':   2 obs. of  1 variable:
>>> $ V1: int  1 2
>>>
>>> The latter result is indeed documented in ?read.table:
>>>     Unless ‘colClasses’ is specified, all columns are read as
>>>     character columns and then converted using ‘type.convert’ to
>>>     logical, integer, numeric, complex or (depending on ‘as.is’)
>>>     factor as appropriate.  Quotes are (by default) interpreted in all
>>>     fields, so a column of values like ‘"42"’ will result in an
>>>     integer column.
>>>
>>>
>>> Should the former behavior be considered a bug?
>>>
>> No. If you tell read.table the column is integer and it's actually
>> character on disk, it should be an error.
>
> My reading of the `read.table` help page is that one should expect that when
> there is an 'integer'-class and an  `as.integer` function and  "integer" is the
> argument to colClasses, that `as.integer` will be applied to the values in the
> column. Should I be reading elsewhere?
>
I assume you're referring to the paragraph below.

  Possible values are ‘NA’ (the default, when ‘type.convert’ is
  used), ‘"NULL"’ (when the column is skipped), one of the
  atomic vector classes (logical, integer, numeric, complex,
  character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’.
  Otherwise there needs to be an ‘as’ method (from package
  ‘methods’) for conversion from ‘"character"’ to the specified
  formal class.

I read that as meaning that an "as" method is required for classes not
already listed in the prior sentence.  It doesn't say an "as" method
will be applied if colClasses is one of the atomic, factor, Date, or
POSIXct classes; but I can see how you might assume that, since all
the atomic, factor, Date, and POSIXct classes already have "as"
methods...

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com



More information about the R-devel mailing list