[R] Creating factors from continuous variables
David James
djames at frontierassoc.com
Fri Aug 26 22:59:45 CEST 2005
What is the quickest way to create many categorical variables
(factors) from continuous variables?
This is the approach that I have used:
# create sample data
N <- 20
x <- runif(N,0,1)
# setup ranges to define categories
x.a <- (x >= 0.0) & (x < 0.4)
x.b <- (x >= 0.4) & (x < 0.5)
x.c <- (x >= 0.5) & (x < 0.6)
x.d <- (x >= 0.6) & (x < 1.0)
# create factors
i <- runif(N,1,1)
x.new <- (i*1*x.a) + (i*2*x.b) + (i*3*x.c) + (i*4*x.d)
x.factor <- factor(x.new)
I'm looking for a better / simpler / more elegant / more robust (as
the number of categories increases) way to do this. I also don't
like that my factor names can only be numbers in this example. I
would prefer a solution to take a form like the following (inspired
by the "hist" function):
# define breakpoints
x.breaks = c(0, 0.4, 0.5, 0.6, 1.0)
x.factornames = c( "0 - 0.4", "0.4 - 0.5", "0.5 - 0.6", "0.6 - 1.0" )
x.factor = unknown.function( x, x.breaks, x.factornames )
Thanks,
David
P.S. Here's what I have read to try to find the answer to my problem:
* "Introductory Statistics with R"
* "A Brief Guide to R for Beginners in Econometrics"
* "Econometrics in R"
More information about the R-help
mailing list