[R] Extreme AIC or BIC values in glm(), logistic regression

Gad Abraham gabraham at csse.unimelb.edu.au
Thu Mar 19 01:55:01 CET 2009


Maggie Wang wrote:
> Dear R-users,
> 
> I use glm() to do logistic regression and use stepAIC() to do stepwise model
> selection.
> 
> The common AIC value comes out is about 100, a good fit is as low as around
> 70. But for some model, the AIC went to extreme values like 1000. When I
> check the P-values, All the independent variables (about 30 of them)
> included in the equation are very significant, which is impossible, because
> we expect some would be dropped.  This situation is not uncommon.
> 
> A summary output like this:
> 
> Coefficients:
>                               Estimate Std. Error   z value Pr(>|z|)
> (Intercept)                   4.883e+14  1.671e+07  29217415   <2e-16 ***
> g761                         -5.383e+14  9.897e+07  -5438529   <2e-16 ***
> g2809                        -1.945e+15  1.082e+08 -17977871   <2e-16 ***
> g3106                        -2.803e+15  9.351e+07 -29976674   <2e-16 ***
> g4373                        -9.272e+14  6.534e+07 -14190077   <2e-16 ***
> g4583                        -2.279e+15  1.223e+08 -18640563   <2e-16 ***
> g761:g2809                   -5.101e+14  4.693e+08  -1086931   <2e-16 ***
> g761:g3106                   -3.399e+16  6.923e+08 -49093218   <2e-16 ***
> g2809:g3106                   3.016e+15  6.860e+08   4397188   <2e-16 ***
> g761:g4373                    3.180e+15  4.595e+08   6920270   <2e-16 ***
> g2809:g4373                  -5.184e+15  4.436e+08 -11685382   <2e-16 ***
> g3106:g4373                   1.589e+16  2.572e+08  61788148   <2e-16 ***
> g761:g4583                   -1.419e+16  8.199e+08 -17303033   <2e-16 ***
> g2809:g4583                  -2.540e+16  8.151e+08 -31156781   <2e-16 ***

I don't have an answer (and you haven't supplied the full code), but one 
obvious thing is that the estimated coefficients are extremely large 
(this is the linear predictor scale, so in the response scale it's even 
worse since you exponentiate it). Perhaps this is due to very high 
collinearity of your variables (however the standard error is low 
relative to the estimate so maybe not), and/or issues of scaling (i.e., 
your variables are very small, use scale() to standardise them.)

-- 
Gad Abraham
MEng Student, Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham at csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham



More information about the R-help mailing list