[R] glm and lrm disagree with zero table cells
Eric Rescorla
ekr at rtfm.com
Thu Oct 24 17:52:13 CEST 2002
I've noticed that glm and lrm give extremely different results if you
attempt to fit a saturated model to a dataset with zero cells. Consider,
for instance the data from, Agresti's Death Penalty example [0].
The crosstab table is:
, , PENALTY = NO
VIC
DEF BLACK WHITE
BLACK 97 52
WHITE 9 132
, , PENALTY = YES
VIC
DEF BLACK WHITE
BLACK 6 11
WHITE 0 19
Regression with an unsaturated model produces essentially
the same fit parameters with both glm and lrm. However,
if we try to fit a saturated model....
FITTING WITH GLM:
> summary(glm(PENALTY~DEF*VIC,binomial))
Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6195 -0.5186 -0.5186 -0.3465 2.3845
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.7830 0.4207 -6.615 3.71e-11 ***
DEFWHITE -4.7823 8.8981 -0.537 0.5910
VICWHITE 1.2296 0.5358 2.295 0.0217 *
DEFWHITE:VICWHITE 4.3973 8.9076 0.494 0.6216
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 226.51 on 325 degrees of freedom
Residual deviance: 218.39 on 322 degrees of freedom
AIC: 226.39
Number of Fisher Scoring iterations: 6
FITTING WITH LRM:
> lrm(PENALTY~DEF*VIC)
Logistic Regression Model
lrm(formula = PENALTY ~ DEF * VIC)
Frequencies of Responses
NO YES
290 36
Obs Max Deriv Model L.R. d.f. P C Dxy
326 0.002 8.13 3 0.0435 0.624 0.248
Gamma Tau-a R2 Brier
0.383 0.049 0.049 0.096
Coef S.E. Wald Z P
Intercept -2.783 0.4207 -6.62 0.0000
DEF=WHITE -5.490 20.8691 -0.26 0.7925
VIC=WHITE 1.230 0.5358 2.29 0.0217
DEF=WHITE * VIC=WHITE 5.105 20.8732 0.24 0.8068
If we fill in the remaining table cell with a dummy value, [1]
however, then glm and lrm produce essentially the same result.
Here's the glm result.
> summary(glm(PENALTY~DEF*VIC,binomial))
Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.6195 -0.5186 -0.5186 -0.3465 2.3845
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.7829 0.4192 -6.639 3.15e-11 ***
DEFWHITE 0.5857 1.1343 0.516 0.6056
VICWHITE 1.2296 0.5346 2.300 0.0215 *
DEFWHITE:VICWHITE -0.9707 1.2070 -0.804 0.4213
---
Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 230.90 on 326 degrees of freedom
Residual deviance: 224.88 on 323 degrees of freedom
AIC: 232.88
Number of Fisher Scoring iterations: 4
So, my question here is: is this normal behavior? If it
is, perhaps someone could speculate on why the results are
different.
Thanks,
-Ekr
[0] Agresti, A., "Categorical Data Analysis", Wiley 1990.
The data set can be found at http://www.rtfm.com/death.txt
[1] http://www.rtfm.com/death-filled-in.txt
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list