[Rd] read.table() with quoted integers
Henrik Bengtsson
hb at biostat.ucsf.edu
Mon Sep 30 17:50:53 CEST 2013
On Mon, Sep 30, 2013 at 5:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> Hi!
>
>
> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
> quoted integers as an acceptable value for columns for which
> colClasses="integer". But when colClasses is omitted, these columns are
> read as integer anyway.
>
> For example, let's consider a file named file.dat, containing:
> "1"
> "2"
>
>> read.table("file.dat", colClasses="integer")
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
> scan() expected 'an integer' and got '"1"'
>
> But:
>> str(read.table("file.dat"))
> 'data.frame': 2 obs. of 1 variable:
> $ V1: int 1 2
>
> The latter result is indeed documented in ?read.table:
> Unless ‘colClasses’ is specified, all columns are read as
> character columns and then converted using ‘type.convert’ to
> logical, integer, numeric, complex or (depending on ‘as.is’)
> factor as appropriate. Quotes are (by default) interpreted in all
> fields, so a column of values like ‘"42"’ will result in an
> integer column.
>
>
> Should the former behavior be considered a bug?
>
> This creates problems when combined with read.table.ffdf from package
> ff, since this function tries to guess the column classes by reading the
> first rows of the file, and then passes colClasses to read.table to read
> the remaining rows by chunks. A column of quoted integers is correctly
> detected as integer in the first read, but read.table() fails in
> subsequent reads.
The readDataFrame() of the R.filesets package provides argument
'trimQuotes' for this exact reason, i.e. for the purpose of trimming
quotes of columns for which 'colClasses' specifies a numeric type
before passing on to read.table(). Feel free to borrow from its
source code for a patch to ff:read.table.ffdf(). The workaround is in
readDataFrame() for TabularTextFile
[https://r-forge.r-project.org/scm/viewvc.php/pkg/R.filesets/R/TabularTextFile.R?view=markup&root=r-dots];
look for the part that starts with:
# SPECIAL CASE/WORKAROUND: read.table()/scan() will give an error
# if a numeric value is quoted and 'colClasses' specifies it as
# a numeric value. In order to read such values, we need to remove
# the quotes first. /HB 2011-07-13
/Henrik
(author of R.filesets)
>
>
> Regards
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list