[R] Some coefficients are doubled when I use the step() function

Ben Bolker bbolker at gmail.com
Sun Dec 9 23:00:42 CET 2012


Chris Beeley <chris.beeley <at> gmail.com> writes:

> Such a strange problem, can't figure it out at all. Using binomial glm
> models, and the step() function, so the call looks like this:
> 
> sectionmodel = glm(formula = Target3 ~ S1Q12_NUM.1 + S1Q9_NUM.1 + S1Q5_NUM.1 +
  [snip]

> But when I run step() on the resulting model, some of the coefficents
> are doubled when it comes back, with a "2" at the end, e.g. like this:
> 
> mymodel = step(sectionmodel, direction="backward", test="F")
> 
> summary(mymodel) returns this:
> 
> Coefficients:
>                  Estimate Std. Error z value Pr(>|z|)
> (Intercept)      -4.58519    0.55675  -8.236   <2e-16 ***
> S1Q12_NUM.1       0.18446    0.08576   2.151   0.0315 *
> S1Q4.12           0.56893    0.40281   1.412   0.1578
> S1Q12_OTHVIOL.11  0.56435    0.38262   1.475   0.1402
> S1Q12_GBH.11      0.49199    0.33175   1.483   0.1381
> S1Q7.11          -1.27330    1.12897  -1.128   0.2594
> S1Q7.12          -1.83927    1.16909  -1.573   0.1157
> S1Q5.11           0.91742    1.19489   0.768   0.4426
> S1Q5.12           2.16861    1.19864   1.809   0.0704 .
> S1Q12_DRUG.11    -0.48400    0.29898  -1.619   0.1055
 
> As you can see S1Q7.1 and S1Q5.1 are duplicated as "S1Q7.11" and
> "S1Q7.12" etc.  I've googled and read and re-read the step() and
> stepAIC() documentation and I just can't figure out what it could
> mean. Removing the test="F" bit also generates the same behaviour.
> Any help greatly appreciated.  Chris Beeley Institute of Mental
> Health, UK

  My guess is that S1Q7.1 and S1Q5.1 are (possibly accidentally)
categorical variables (factors), and that either the second and
third levels of the factors are "1" and "2", or you have set
sum-to-zero contrasts somewhere along the line.

  Note that other variables have numeric values appended to
their names, which indicates that they are also being treated
as categorical variables, and that their levels are coded
numerically ... (e.g. SIQ4.1)

  My prediction is that this "doubling" is independent of
the use of step(), and that you would see these parameters
reflected in the summary() of the full model ...




More information about the R-help mailing list