[Rd] read.table() with quoted integers
Milan Bouchet-Valat
nalimilan at club.fr
Fri Oct 4 16:01:46 CEST 2013
Le vendredi 04 octobre 2013 à 07:55 -0400, Duncan Murdoch a écrit :
> On 13-10-04 7:31 AM, Joshua Ulrich wrote:
> > On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsemius at comcast.net> wrote:
> >>
> >> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:
> >>
> >>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> >>>> Hi!
> >>>>
> >>>>
> >>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
> >>>> quoted integers as an acceptable value for columns for which
> >>>> colClasses="integer". But when colClasses is omitted, these columns are
> >>>> read as integer anyway.
> >>>>
> >>>> For example, let's consider a file named file.dat, containing:
> >>>> "1"
> >>>> "2"
> >>>>
> >>>>> read.table("file.dat", colClasses="integer")
> >>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
> >>>> scan() expected 'an integer' and got '"1"'
> >>>>
> >>>> But:
> >>>>> str(read.table("file.dat"))
> >>>> 'data.frame': 2 obs. of 1 variable:
> >>>> $ V1: int 1 2
> >>>>
> >>>> The latter result is indeed documented in ?read.table:
> >>>> Unless ‘colClasses’ is specified, all columns are read as
> >>>> character columns and then converted using ‘type.convert’ to
> >>>> logical, integer, numeric, complex or (depending on ‘as.is’)
> >>>> factor as appropriate. Quotes are (by default) interpreted in all
> >>>> fields, so a column of values like ‘"42"’ will result in an
> >>>> integer column.
> >>>>
> >>>>
> >>>> Should the former behavior be considered a bug?
> >>>>
> >>> No. If you tell read.table the column is integer and it's actually
> >>> character on disk, it should be an error.
> >>
> >> My reading of the `read.table` help page is that one should expect that when
> >> there is an 'integer'-class and an `as.integer` function and "integer" is the
> >> argument to colClasses, that `as.integer` will be applied to the values in the
> >> column. Should I be reading elsewhere?
> >>
> > I assume you're referring to the paragraph below.
> >
> > Possible values are ‘NA’ (the default, when ‘type.convert’ is
> > used), ‘"NULL"’ (when the column is skipped), one of the
> > atomic vector classes (logical, integer, numeric, complex,
> > character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’.
> > Otherwise there needs to be an ‘as’ method (from package
> > ‘methods’) for conversion from ‘"character"’ to the specified
> > formal class.
> >
> > I read that as meaning that an "as" method is required for classes not
> > already listed in the prior sentence. It doesn't say an "as" method
> > will be applied if colClasses is one of the atomic, factor, Date, or
> > POSIXct classes; but I can see how you might assume that, since all
> > the atomic, factor, Date, and POSIXct classes already have "as"
> > methods...
>
> And this does suggest a workaround for ffdf: instead of declaring the
> class to be "integer", declare a class "ffdf_integer", and write a
> conversion method. Or simply read everything as character and call
> as.integer() explicitly.
This is indeed an interesting workaround for read.table.ffdf(), thanks!
I still think adapting the behavior of scan() would be an interesting
improvement for R users, though.
Regards
More information about the R-devel
mailing list