[R] colClasses does not cause read.table to coerce to numeric; anymore?

David Winsemius dwinsemius at comcast.net
Sat Dec 14 17:50:19 CET 2013


On Dec 14, 2013, at 7:53 AM, Uwe Ligges wrote:

> David,
> 
> how should R interpret "110+"? It cannot be numeric, perhaps you have not recognized the "+" there?
> 

I specifically included the fragment ofthe much longer file that was throwing the error. If this behavior doesn't appear flawed to you, Uwe, then am apparently under the misapprehension that it in the past it would have been coerced to NA.

-- 
David.



> Uwe
> 
> 
> 
> 
> On 14.12.2013 01:35, David Winsemius wrote:
>> 
>> I thought that setting colClasses to numeric would coerce errant data to NA. Instead read.table is throwing
>> errors. This is not what I remember from prior experience with read.table and it is not how I read the help page as promising:
>> 
>> BE<-
>> c("   1841       96           42.26        31.50        73.75 ",
>> "   1841       97           29.56        20.78        50.34 ",
>> "   1841       98           18.71        10.59        29.30 ",
>> "   1841       99           10.48         6.23        16.71 ",
>> "   1841      100            6.14         4.23        10.37 ",
>> "   1841      101            3.31         2.06         5.38 ",
>> "   1841      102            1.50         0.83         2.34 ",
>> "   1841      103            0.33         0.05         0.38 ",
>> "   1841      104            0.00         0.00         0.00 ",
>> "   1841      105            0.00         0.00         0.00 ",
>> "   1841      106            0.00         0.00         0.00 ",
>> "   1841      107            0.00         0.00         0.00 ",
>> "   1841      108            0.00         0.00         0.00 ",
>> "   1841      109            0.00         0.00         0.00 ",
>> "   1841      110+           0.00         0.00         0.00 ",
>> "   1842        0        60290.60     62238.19    122528.79 ",
>> "   1842        1        54893.31     55849.06    110742.37 ",
>> "   1842        2        51991.87     53033.62    105025.49 ",
>> "   1842        3        49697.90     50789.01    100486.91 ",
>> "   1842        4        47598.24     48414.78     96013.02 ",
>> "   1842        5        46202.38     47106.34     93308.72"
>> )
>> #-----------
>>  BELe<-read.table(text=BE,
>>                   header=FALSE, colClasses="numeric", as.is=TRUE)
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>>   scan() expected 'a real', got '110+'
>> 
>> I originally got this when reading from a file, but the error is from scan(). Was this an unfortunate side-effect of adding the `text` argument to read.table? It does still persist when the character string is pass through textConnection tot he file argument:
>> 
>> BELe<-read.table(file=textConnection(BE),
>>                  header=FALSE, colClasses="numeric")
>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>>   scan() expected 'a real', got '110+'
>> 
>> My memory was that such coercion was effective in past years.
>> 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list