[R] stepAIC and polynomial terms

Robert A LaBudde ral at lcfltd.com
Mon Mar 17 04:50:49 CET 2008


At 08:50 PM 3/16/2008, caspar wrote:
>Dear all,
>I have a question regarding the use of stepAIC and polynomial 
>(quadratic to be specific) terms in a binary logistic regression 
>model. I read in McCullagh and Nelder, (1989, p 89) and as far as I 
>remember from my statistics cources, higher-degree polynomial 
>effects should not be included without the main effects. If I 
>understand this correctly, following a stepwise model selection 
>based on AIC should not lead to a model where the main effect of 
>some continuous covariate is removed, but the quadratic term is kept.
>The question is, should I keep the quadratic term (note, there are 
>other main effects that were retained following the stepwise 
>algorithm) in the final model or should I delete it as well and move 
>on? Or should I retain the main effect as well?
>
>To picture it, the initial model to which I called stepAIC is:
>
>Call:  glm(formula = S ~ FR + Date * age + I(age^2), family = 
>logexposure(ExposureDays = DATA$int),      data = DATA)
>
>and the final one:
>
>Call:  glm(formula = S ~ FR + Date + I(age^2), family = 
>logexposure(ExposureDays = DATA$int),      data = DATA)
>
>Thanks very much in advance for your thoughts and suggestions,
>
>Caspar

1. You should only exclude "age" as a linear term if you have sound 
causal reason for believing a pure quadratic component is solely 
present. Based on your example, you probably don't have this.

2. You also need to work about interactions.

3. An alternative to your polynomial approach to such a causal 
variable as age is to categorize age into 5 or 10 year intervals, and 
see how the fit breaks down by these levels.

4. You should plot your data vs. age to see what the dependence is. 
Frequently curve is flat up to a certain age, and then linear 
thereafter. This gives rise to a pseudo-quadratic relationship. You 
should be able to fit it better with the split plus a linear term.

5. Think about how age should affect your response before trying models.

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



More information about the R-help mailing list