[R] Why do we have to turn factors into characters for various functions?
savicky at cs.cas.cz
Sun Dec 12 20:12:55 CET 2010
On Sun, Dec 12, 2010 at 12:48:30AM +0200, Tal Galili wrote:
> Hello dear R-help mailing list,
> My question is *not* about how factors are implemented in R (which is, if I
> understand correctly, that factors keeps numbers and assign levels to them).
> My question *is* about why so many functions that work on factors don't
> treat them as characters by default?
Personally, i try to use factors only when there is a specific reason
for this and character type otherwise. Factors are natural in the data
used for construction of a classification model or for categorical
attributes, also for preparing input to table() function and related things.
> Here are two simple examples:
> Example one turning the characters inside a factor into numeric:
> x <- factor(4:6)
> as.numeric(x) # output: 1 2 3
> as.numeric(as.character(x)) # output: 4 5 6 # isn't this what we wanted?
If you are concerned with computing time, then applying as.numeric()
only to the levels is probably better
x <- factor(rep(4:6, times=1000000))
cpu1 <- system.time( out1 <- as.numeric(as.character(x)) )
cpu2 <- system.time( out2 <- as.numeric(levels(x))[as.integer(x)] )
user.self sys.self elapsed user.child sys.child
cpu1 0.570 0.031 0.601 0 0
cpu2 0.042 0.027 0.070 0 0
> Is it that implementing a switch of factors to characters as the default in
> some of the basic function will cause old code to break?
I think that this is an important part of the reason.
More information about the R-help