[Rd] read.table() with quoted integers
peter dalgaard
pdalgd at gmail.com
Fri Oct 4 18:15:14 CEST 2013
On Oct 4, 2013, at 17:10 , Henrik Bengtsson wrote:
> On Fri, Oct 4, 2013 at 4:55 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
>> On 13-10-04 7:31 AM, Joshua Ulrich wrote:
>>>
>>> On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsemius at comcast.net>
>>> wrote:
>>>>
>>>>
>>>> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:
>>>>
>>>>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr>
>>>>> wrote:
>>>>>>
>>>>>> Hi!
>>>>>>
>>>>>>
>>>>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
>>>>>> quoted integers as an acceptable value for columns for which
>>>>>> colClasses="integer". But when colClasses is omitted, these columns are
>>>>>> read as integer anyway.
>>>>>>
>>>>>> For example, let's consider a file named file.dat, containing:
>>>>>> "1"
>>>>>> "2"
>>>>>>
>>>>>>> read.table("file.dat", colClasses="integer")
>>>>>>
>>>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>>>>> na.strings, :
>>>>>> scan() expected 'an integer' and got '"1"'
>>>>>>
>>>>>> But:
>>>>>>>
>>>>>>> str(read.table("file.dat"))
>>>>>>
>>>>>> 'data.frame': 2 obs. of 1 variable:
>>>>>> $ V1: int 1 2
>>>>>>
>>>>>> The latter result is indeed documented in ?read.table:
>>>>>> Unless ‘colClasses’ is specified, all columns are read as
>>>>>> character columns and then converted using ‘type.convert’ to
>>>>>> logical, integer, numeric, complex or (depending on ‘as.is’)
>>>>>> factor as appropriate. Quotes are (by default) interpreted in all
>>>>>> fields, so a column of values like ‘"42"’ will result in an
>>>>>> integer column.
>>>>>>
>>>>>>
>>>>>> Should the former behavior be considered a bug?
>>>>>>
>>>>> No. If you tell read.table the column is integer and it's actually
>>>>> character on disk, it should be an error.
>>>>
>>>>
>>>> My reading of the `read.table` help page is that one should expect that
>>>> when
>>>> there is an 'integer'-class and an `as.integer` function and "integer"
>>>> is the
>>>> argument to colClasses, that `as.integer` will be applied to the values
>>>> in the
>>>> column. Should I be reading elsewhere?
>>>>
>>> I assume you're referring to the paragraph below.
>>>
>>> Possible values are ‘NA’ (the default, when ‘type.convert’ is
>>> used), ‘"NULL"’ (when the column is skipped), one of the
>>> atomic vector classes (logical, integer, numeric, complex,
>>> character, raw), or ‘"factor"’, ‘"Date"’ or ‘"POSIXct"’.
>>> Otherwise there needs to be an ‘as’ method (from package
>>> ‘methods’) for conversion from ‘"character"’ to the specified
>>> formal class.
>>>
>>> I read that as meaning that an "as" method is required for classes not
>>> already listed in the prior sentence. It doesn't say an "as" method
>>> will be applied if colClasses is one of the atomic, factor, Date, or
>>> POSIXct classes; but I can see how you might assume that, since all
>>> the atomic, factor, Date, and POSIXct classes already have "as"
>>> methods...
>>
>>
>> And this does suggest a workaround for ffdf: instead of declaring the class
>> to be "integer", declare a class "ffdf_integer", and write a conversion
>> method. Or simply read everything as character and call as.integer()
>> explicitly.
>
> Just a note of concert since several proposed it:
concerN?
> colClasses="character") followed by as.integer() or strtoi() misses
> the validation, e.g. "foo" will be turned into NA_integer_. Using
> read.table() or scan() gives an error.
The obvious fix for that would seem to be to use scan() on the character vector:
> y <- c("1","2",3,4,5)
> y
[1] "1" "2" "3" "4" "5"
> scan(text=y)
Read 5 items
[1] 1 2 3 4 5
> y <- c("1","2",3,4,"NA")
> scan(text=y)
Read 5 items
[1] 1 2 3 4 NA
> y <- c("1","2",3,4,"foo")
> scan(text=y)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'a real', got 'foo'
>
> /Henrik
>
>>
>> Duncan Murdoch
>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-devel
mailing list