[R] read columns of quoted numbers as factors

Gabor Grothendieck ggrothendieck at gmail.com
Wed Oct 6 05:17:00 CEST 2010


On Mon, Oct 4, 2010 at 12:39 PM, james hirschorn <j_hirschorn at yahoo.com> wrote:
> Suppose I have a data file (possibly with a huge number of columns), where the
> columns with factors are coded as "1", "2", "3", etc ... The default behavior of
> read.table is to convert these columns to integer vectors.
>
> Is there a way to get read.table to recognize that columns of quoted numbers
> represent factors (while unquoted numbers are interpreted as integers), without
> explicitly setting them with colClasses ?

Although its a bit messy its nevertheless only a few lines of code to
transform the quote-and-digit columns to non-numeric, read them in and
transform back. For example, if ! does not appear in the file we could
insert ! characters into the quote-and-digit columns and remove them
afterwards:

L <- readLines("myfile.dat")
L2 <- gsub('"(\\d+)"', "!\\1", L) # insert !
DF <- read.table(textConnection(L2), header = TRUE)

# remove !
ix <- sapply(DF, is.factor)
DF[ix] <- lapply(DF[ix], function(x) factor(gsub("!", "", x)))

str(DF)


-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list