[R] Questions on factors in regression analysis
guox at ucalgary.ca
guox at ucalgary.ca
Thu Aug 20 19:46:44 CEST 2009
I got two questions on factors in regression:
Q1.
In a table, there a few categorical/factor variables, a few numerical
variables and the response variable is numeric. Some factors are important
but others not.
How to determine which categorical variables are significant to the
response variable?
Q2.
As we knew, lm can deal with categorical variables.
I thought, when there is a categorical predictor, we may use lm directly
without quantifying these factors and assigning different values to factors
would not change the fittings as shown:
x <- 1:20 ## numeric predictor
yes.no <- c("yes","no")
factors <- gl(2,10,20,yes.no) ##factor predictor
factors.quant <- rep(c(18.8,29.9),c(10,10)) ##quantificatio of factors
factors.quant.1 <- rep(c(16.9,38.9),c(10,10))
##second quantificatio of factors
response <- 0.8*x + 18 + factors.quant + rnorm(20) ##response
lm.quant <- lm(response ~ x + factors.quant) ##lm with quantifications
lm.fact <- lm(response ~ x + factors) ##lm with factors
lm.quant.1 <- lm(response ~ x + factors.quant.1) ##lm with quantifications
lm.fact.1 <- lm(response ~ x + factors) ##lm with factors
par(mfrow=c(2,2)) ## comparisons of two fittings
plot(x, response)
lines(x,fitted(lm.quant),col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact),col = "red")
grid()
plot(x, response)
lines(x,fitted(lm.quant.1),lty =2,col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact.1),lty =2,col = "red")
grid()
par(mfrow = c(1,1))
So, is it right that we can assign any numeric values to factors,
for example, c(yes, no) = c(18.8,29.9) or (16.9,38.9) in the above,
before doing lm, glm, aov, even nls?
Please drop a few lines and/or direct me some references. Thanks,
-james
More information about the R-help
mailing list