[R] Behavior of ordered factors in glm
David Winsemius
dwinsemius at comcast.net
Sun Jan 6 00:52:22 CET 2008
I have a variable which is roughly age categories in decades. In the
original data, it came in coded:
> str(xxx)
'data.frame': 58271 obs. of 29 variables:
$ issuecat : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1 1 1...
snip
I then defined issuecat as ordered:
> xxx$issuecat<-as.ordered(xxx$issuecat)
When I include issuecat in a glm model, the result makes me think I
have asked R for a linear+quadratic+cubic+quartic polynomial fit. The
results are not terribly surprising under that interpretation, but I
was hoping for only a linear term (which I was taught to called a "test
of trend"), at least as a starting point.
> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
> summary(age.mdl)
Call:
glm(formula = actual ~ issuecat, family = "poisson", data = xxx)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.3190 -0.2262 -0.1649 -0.1221 5.4776
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.31321 0.04865 -88.665 <2e-16 ***
issuecat.L 2.12717 0.13328 15.960 <2e-16 ***
issuecat.Q -0.06568 0.11842 -0.555 0.579
issuecat.C 0.08838 0.09737 0.908 0.364
issuecat^4 -0.02701 0.07786 -0.347 0.729
This also means my advice to a another poster this morning may have
been misleading. I have tried puzzling out what I don't understand by
looking at indices or searching in MASSv2, the Blue Book, Thompson's
application of R to Agresti's text, and the FAQ, so far without
success. What I would like to achieve is having the lowest age category
be a reference category (with the intercept being the log-rate) and
each succeeding age category be incremented by 1. The linear estimate
would be the log(risk-ratio) for increasing ages. I don't want the
higher order polynomial estimates. Am I hoping for too much?
--
David Winsemius
using R 2.6.1 in WinXP
More information about the R-help
mailing list