[R] Questions on factors in regression analysis

guox at ucalgary.ca guox at ucalgary.ca
Thu Aug 20 19:46:44 CEST 2009


I got two questions on factors in regression:

Q1.
In a table, there a few categorical/factor variables, a few numerical
variables and the response variable is numeric. Some factors are important
but others not.
How to determine which categorical variables are significant to the
response variable?

Q2.
As we knew, lm can deal with categorical variables.
I thought, when there is a categorical predictor, we may use lm directly
without quantifying these factors and assigning different values to factors
would not change the fittings as shown:

x <- 1:20 ## numeric predictor
yes.no <- c("yes","no")
factors <- gl(2,10,20,yes.no) ##factor predictor
factors.quant <-  rep(c(18.8,29.9),c(10,10)) ##quantificatio of factors
factors.quant.1 <-  rep(c(16.9,38.9),c(10,10))
   ##second quantificatio of factors
response <- 0.8*x + 18 + factors.quant + rnorm(20) ##response
lm.quant <- lm(response ~ x + factors.quant) ##lm with quantifications
lm.fact <- lm(response ~ x + factors) ##lm with factors

lm.quant.1 <- lm(response ~ x + factors.quant.1) ##lm with quantifications
lm.fact.1 <- lm(response ~ x + factors) ##lm with factors

par(mfrow=c(2,2)) ## comparisons of two fittings
plot(x, response)
lines(x,fitted(lm.quant),col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact),col = "red")
grid()
plot(x, response)
lines(x,fitted(lm.quant.1),lty =2,col="blue")
grid()
plot(x,response)
lines(x,fitted(lm.fact.1),lty =2,col = "red")
grid()
par(mfrow = c(1,1))

So, is it right that we can assign any numeric values to factors,
for example, c(yes, no) = c(18.8,29.9) or (16.9,38.9) in the above,
before doing lm, glm, aov, even nls?


Please drop a few lines and/or direct me some references. Thanks,

-james




More information about the R-help mailing list