[R] Behavior of ordered factors in glm

David Winsemius dwinsemius at comcast.net
Sun Jan 6 00:52:22 CET 2008

I have a variable which is roughly age categories in decades. In the 
original data, it came in coded:
> str(xxx)
'data.frame':   58271 obs. of  29 variables:
 $ issuecat   : Factor w/ 5 levels "0 - 39","40 - 49",..: 1 1  1 1...

I then defined issuecat as ordered:
> xxx$issuecat<-as.ordered(xxx$issuecat)

When I include issuecat in a glm model, the result makes me think I 
have asked R for a linear+quadratic+cubic+quartic polynomial fit. The 
results are not terribly surprising under that interpretation, but I 
was hoping for only a linear term (which I was taught to called a "test 
of trend"), at least as a starting point.

> age.mdl<-glm(actual~issuecat,data=xxx,family="poisson")
> summary(age.mdl)

glm(formula = actual ~ issuecat, family = "poisson", data = xxx)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.3190  -0.2262  -0.1649  -0.1221   5.4776  

            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -4.31321    0.04865 -88.665   <2e-16 ***
issuecat.L   2.12717    0.13328  15.960   <2e-16 ***
issuecat.Q  -0.06568    0.11842  -0.555    0.579    
issuecat.C   0.08838    0.09737   0.908    0.364    
issuecat^4  -0.02701    0.07786  -0.347    0.729 

This also means my advice to a another poster this morning may have 
been misleading. I have tried puzzling out what I don't understand by 
looking at indices or searching in MASSv2, the Blue Book, Thompson's 
application of R to Agresti's text, and the FAQ, so far without 
success. What I would like to achieve is having the lowest age category 
be a reference category (with the intercept being the log-rate) and 
each succeeding age category  be incremented by 1. The linear estimate 
would be the log(risk-ratio) for increasing ages. I don't want the 
higher order polynomial estimates. Am I hoping for too much?

David Winsemius
using R 2.6.1 in WinXP

More information about the R-help mailing list