[R] glm and lrm disagree with zero table cells

Eric Rescorla ekr at rtfm.com
Thu Oct 24 17:52:13 CEST 2002


I've noticed that glm and lrm give extremely different results if you
attempt to fit a saturated model to a dataset with zero cells. Consider,
for instance the data from, Agresti's Death Penalty example [0].

The crosstab table is:

, , PENALTY = NO

       VIC
DEF     BLACK WHITE
  BLACK    97    52
  WHITE     9   132

, , PENALTY = YES

       VIC
DEF     BLACK WHITE
  BLACK     6    11
  WHITE     0    19


Regression with an unsaturated model produces essentially
the same fit parameters with both glm and lrm. However, 
if we try to fit a saturated model....

FITTING WITH GLM:
> summary(glm(PENALTY~DEF*VIC,binomial))

Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.6195  -0.5186  -0.5186  -0.3465   2.3845  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -2.7830     0.4207  -6.615 3.71e-11 ***
DEFWHITE           -4.7823     8.8981  -0.537   0.5910    
VICWHITE            1.2296     0.5358   2.295   0.0217 *  
DEFWHITE:VICWHITE   4.3973     8.9076   0.494   0.6216    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 226.51  on 325  degrees of freedom
Residual deviance: 218.39  on 322  degrees of freedom
AIC: 226.39

Number of Fisher Scoring iterations: 6


FITTING WITH LRM:
> lrm(PENALTY~DEF*VIC)

Logistic Regression Model

lrm(formula = PENALTY ~ DEF * VIC)


Frequencies of Responses
 NO YES 
290  36 

       Obs  Max Deriv Model L.R.       d.f.          P          C        Dxy 
       326      0.002       8.13          3     0.0435      0.624      0.248 
     Gamma      Tau-a         R2      Brier 
     0.383      0.049      0.049      0.096 

                      Coef   S.E.    Wald Z P     
Intercept             -2.783  0.4207 -6.62  0.0000
DEF=WHITE             -5.490 20.8691 -0.26  0.7925
VIC=WHITE              1.230  0.5358  2.29  0.0217
DEF=WHITE * VIC=WHITE  5.105 20.8732  0.24  0.8068



If we fill in the remaining table cell with a dummy value, [1]
however, then glm and lrm produce essentially the same result.
Here's the glm result.

> summary(glm(PENALTY~DEF*VIC,binomial))

Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.6195  -0.5186  -0.5186  -0.3465   2.3845  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -2.7829     0.4192  -6.639 3.15e-11 ***
DEFWHITE            0.5857     1.1343   0.516   0.6056    
VICWHITE            1.2296     0.5346   2.300   0.0215 *  
DEFWHITE:VICWHITE  -0.9707     1.2070  -0.804   0.4213    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 230.90  on 326  degrees of freedom
Residual deviance: 224.88  on 323  degrees of freedom
AIC: 232.88

Number of Fisher Scoring iterations: 4


So, my question here is: is this normal behavior? If it
is, perhaps someone could speculate on why the results are
different.

Thanks,
-Ekr


[0] Agresti, A., "Categorical Data Analysis", Wiley 1990.
The data set can be found at http://www.rtfm.com/death.txt

[1] http://www.rtfm.com/death-filled-in.txt





-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list