[R] rms:fastbw variable selection differences with AIC .vs. p value methods
Rob James
aetiologic at gmail.com
Fri Aug 19 21:04:44 CEST 2011
I want to employ a parsimonious model to draw nomograms, as the full
model is too complex to draw nomograms readily (several interactions of
continuous variables). However, one interesting variable stays or
leaves based on whether I choose "p value" or "AIC" options to
fastbw(). My question boils down to this: Is there a theoretical reason
to prefer one over another?
Consider:
fastbw(model94c, aic=1e10)
Deleted Chi-Sq d.f. P Residual d.f. P AIC
ToD 0.11 3 0.9903 0.11 3 0.9903 -5.89
Experience * ToD 2.56 3 0.4646 2.67 6 0.8487 -9.33
Experience * Assoc 0.45 2 0.7970 3.13 8 0.9262 -12.87
RatePressure 2.99 3 0.3939 6.11 11 0.8658 -15.89
DW_height_t 2.92 3 0.4047 9.03 14 0.8293 -18.97
TBV * Experience 3.46 3 0.3260 12.49 17 0.7698 -21.51
Experience * Sex 0.05 1 0.8153 12.54 18 0.8181 -23.46
Experience 0.18 1 0.672 12.72 19 0.8526 -25.28
Sex 1.19 1 0.2745 13.91 20 0.8348 -26.09
Assoc 6.09 2 0.0475 20.01 22 0.5826 -23.99
Experience * Pulse 10.53 3 0.0146 30.53 25 0.2049 -19.47
Sex * ToD 18.24 3 0.0004 48.77 28 0.0088 -7.23
PulsePressure 21.15 3 0.0001 69.92 31 0.0001 7.92
Race 19.87 2 0.0000 89.79 33 0.0000 23.79
Pulse 25.31 3 0.0000 115.09 36 0.0000 43.09
Age * Experience 202.80 3 0.0000 317.89 39 0.0000 239.89
TBV 282.41 3 0.0000 600.30 42 0.0000 516.30
Location 310.19 14 0.0000 910.50 56 0.0000 798.50
Age 809.64 3 0.0000 1720.13 59 0.0000 1602.13
The ordering of variables is expected, and is consistent with the
substantial knowledge I have about the outcome.
The problematic variable is Sex * TOD . When I use p value as the rule,
with an SLS of 0.01, the variable is retained, but when I use AIC ,
Sex*TOD is not retained. This reflects the fact that while the Sex*TOD
interaction is theoretically interesting, the AIC value is negative and
relatively small in magnitude, even as the p value skirts below 0.01.
Is this judgement territory or are their statistical considerations that
should be invoked? Caveats?
Is there a theoretical reason to choose AIC over p value methods, or is
either acceptable?
More information about the R-help
mailing list