[R] interpreting the output of a glm with an ordered categorical predictor.

Sat Mar 3 09:30:31 CET 2012

On Mar 3, 2012, at 02:10 , kmuller wrote:

> 
> Greetings.
> 
> I'm a Master's student working on an analysis of herbivore damage on plants.
> I have a tried running a glm with one categorical predictor (aphid
> abundance) and a binomial response (presence/absence of herbivore damage).
> My predictor has four categories: high, medium, low, and none. I used the
> "ordered" function to sort my categories for a glm.
> 
> ah <-
> read.csv("http://depot.northwestern.edu/class/2012WI_PBC_435-0_AND_BIOL_SCI_313/muller/herbivoryEdit.csv")
> ah1<- ah[ah$date=="110810",] 
> ah2<-ah[ah$date=="110904",]
> 
> aphidOrder <- ordered(ah2$aphidLevelMax,levels=c("none", "low", "med",
> "high"))
> 
> ordAph <- glm(chewholebinom~aphidOrder,family=binomial,data=ah2)
> 
> 
> When I ran the summary for the glm (output pasted below), I could not tell
> which intercept referred to which factor level. My question is, what do .L,
> .Q, and .C mean and how can I relate these factors to my original factors
> (none, low, med, high)?

This is an oddity in the modeling code, dating back about two decades and probably impossible to get rid of at this stage. 

What you are seeing is polynomial contrasts: Linear, Quadratic, Cubic, with a slightly peculiar parametrization designed to be orthogonal in balanced designs. In such designs, you can see whether your effect is described by a low order polynomial by checking whether higher order terms are insignificant. Notice that this assumes that you can assign equidistant scores to the groups (i.e. codes 1:4 in your case).

(If the above is gibberish to you, maybe try something like 
matplot(contr.poly(15)[,1:4], type="b")
to see the first few polynomial curves.)

The whole thing is somewhat useful; e.g., for "response surface designs" where you systematically vary multiple factors in equidistant steps. However, it is not obvious that it should have been unleashed on all kinds of ordered factors.

I conjecture that you don't actually need it, a plain factor() with the levels in the right order should do. 

> 
> Thank you for your help,
> 
> Katherine
> 
> summary(ordAph)
> 
> Call:
> glm(formula = chewholebinom ~ aphidOrder, family = binomial, 
>    data = ah2)
> 
> Deviance Residuals: 
>    Min       1Q   Median       3Q      Max  
> -1.6512  -0.9817   0.7687   0.7687   1.5353  
> 
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)   
> (Intercept)  -0.05567    0.25097  -0.222   0.8245   
> aphidOrder.L -1.36755    0.49366  -2.770   0.0056 **
> aphidOrder.Q  0.36824    0.50195   0.734   0.4632   
> aphidOrder.C -0.09840    0.51011  -0.193   0.8470   
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>    Null deviance: 137.99  on 99  degrees of freedom
> Residual deviance: 124.05  on 96  degrees of freedom
> AIC: 132.05
> 
> Number of Fisher Scoring iterations: 4
> 
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/interpreting-the-output-of-a-glm-with-an-ordered-categorical-predictor-tp4440383p4440383.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com