[R] retaining characters in a csv file

David Winsemius dwinsemius at comcast.net
Wed Sep 23 00:58:08 CEST 2015


On Sep 22, 2015, at 3:00 PM, Therneau, Terry M., Ph.D. wrote:

> I have a csv file from an automatic process (so this will happen thousands of times), for which the first row is a vector of variable names and the second row often starts something like this:
> 
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .....
> 
> Notice the second variable which is
>      a character string (note the quotation marks)
>      a sequence of numeric digits
>      leading zeros are significant
> 
> The read.csv function insists on turning this into a numeric.  Is there any simple set of options that
> will turn this behavior off?  I'm looking for a way to tell it to "obey the bloody quotes" -- I still want the first, third, etc columns to become numeric.  There can be more than one variable like this, and not always in the second position.

The last part about not knowing which col might be an issue might require inputting everything with character class, but if there is a way to pass in a colClasses argument this might help:

> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", rep("character", 4)))
       V1           V2         V3         V4 V5
1 5724550 000202075214 2005.02.17 2005.02.17  F

Or you can create a class with an As method:

> setClass('myChar')
> setAs('character', 'myChar', def=function(from, to ) to <- I(from))
> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", rep('myChar',4)) )
       V1           V2         V3         V4 V5
1 5724550 000202075214 2005.02.17 2005.02.17  F

(Neither of the third or fourth columns makes sense as a numeric, so now illustrating coercion to Date.)

> setClass('dotDate')
> setAs('character', 'dotDate', def=function(from, to ) to <- as.Date(from, "%Y.%m.%d")  )

> read.csv(text='5724550,"000202075214",2005.02.17,2005.02.17,"F"', stringsAsFactors=FALSE, header=FALSE, colClasses=c("numeric", "character", rep('dotDate',2), "character") )
       V1           V2         V3         V4 V5
1 5724550 000202075214 2005-02-17 2005-02-17  F


> 
> This happens deep inside the httr library; there is an easy way for me to add more options to the read.csv call but it is not so easy to replace it with something else.
> 
> Terry T
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list