[R] Help with categorical predicrots in regression models
Pamela Foggia
pamela.foggia at gmail.com
Fri Jun 19 23:32:59 CEST 2015
Hello,
In my regression models (linear and logistic models) I have two predictor
variables, both are categorical variables: DEGREE and REGION.
DEGREE is for educational level, that is an ordinal variable with five
levels (0-LT HIGH SCHOOL, 1-HIGH SCHOOL, 2-JUNIOR COLLEGE, 3-BACHELOR,
4-GRADUATE).
REGION is for the region of the respondent, that is a nominal variable with
9 levels (1-NEW ENGLAND, 2-MIDDLE ATLANTIC, 3-E. NOR. CENTRAL, 4-W. NOR.
CENTRAL, 5-SOUTH ATLANTIC, 6-E. SOU. CENTRAL, 7-W. SOU. CENTRAL, 8-
MOUNTAIN, 9-PACIFIC).
In many examples I read that, in order to use correctly these predictors as
categorical variables, I have to use before the FACTOR function, for
example in this way
fit1 <- lm(Z ~ factor(X) + factor(Y))
fit2 <- glm(W ~ factor(x) + factor(Y), family=binomial(link="logit"))
obtaining the following output for the logistic regression
coef.est coef.se
(Intercept) 1.027 0.263
factor(DEGREE)1 0.301 0.134
factor(DEGREE)2 0.340 0.211
factor(DEGREE)3 0.748 0.168
factor(DEGREE)4 1.267 0.237
...
where clearly Z is a continuous variable and W is a binary variable. My
question is: as far as the ordinal variable X is concerned, would it be
more correct to use the ORDERED function rather than FACTOR? I mean an
operation like this
fit1 <- lm(Z ~ ordered(X) + factor(Y))
fit2 <- glm(W ~ ordered(x) + factor(Y), family=binomial(link="logit"))
where I obtain a different output like this
coef.est coef.se
(Intercept) 1.558 0.241
ordered(DEGREE).L 0.942 0.157
ordered(DEGREE).Q 0.215 0.160
ordered(DEGREE).C 0.118 0.111
ordered(DEGREE)^4 -0.106 0.143
...
What do the letters L, Q, C and the power ^4 (which I find in the output)
mean?
Thanks in advance
[[alternative HTML version deleted]]
More information about the R-help
mailing list