[R] Goodness of fit of binary logistic model

Fri Aug 5 20:07:34 CEST 2011

On Aug 5, 2011, at 12:53 PM, Paul Smith wrote:

> On Fri, Aug 5, 2011 at 5:35 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>>>> I have just estimated this model:
>>>>> -----------------------------------------------------------
>>>>> Logistic Regression Model
>>>>>
>>>>> lrm(formula = Y ~ X16, x = T, y = T)
>>>>>
>>>>>                   Model Likelihood     Discrimination    Rank  
>>>>> Discrim.
>>>>>                      Ratio Test            Indexes           
>>>>> Indexes
>>>>>
>>>>> Obs            82    LR chi2      5.58    R2       0.088    C
>>>>> 0.607
>>>>> 0             46    d.f.            1    g        0.488     
>>>>> Dxy     0.215
>>>>> 1             36    Pr(> chi2) 0.0182    gr       1.629     
>>>>> gamma   0.589
>>>>> max |deriv| 9e-11                         gp       0.107    tau-a
>>>>> 0.107
>>>>>                                        Brier    0.231
>>>>>
>>>>>        Coef    S.E.   Wald Z Pr(>|Z|)
>>>>> Intercept -1.3218 0.5627 -2.35  0.0188
>>>>> X16=1      1.3535 0.6166  2.20  0.0282
>>>>> -----------------------------------------------------------
>>>>>
>>>>> Analyzing the goodness of fit:
>>>>>
>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> resid(model.lrm,'gof')
>>>>>
>>>>> Sum of squared errors     Expected value|H0                    SD
>>>>>       1.890393e+01          1.890393e+01          6.073415e-16
>>>>>                  Z                     P
>>>>>      -8.638125e+04          0.000000e+00
>>>>> -----------------------------------------------------------
>>>>>
>>>>>> From the above calculated p-value (0.000000e+00), one should  
>>>>>> discard
>>>>>
>>>>> this model. However, there is something that is puzzling me: If  
>>>>> the
>>>>> 'Expected value|H0' is so coincidental with the 'Sum of squared
>>>>> errors', why should one discard the model? I am certainly missing
>>>>> something.
>>>>
>>>> It's hard to tell what you are missing, since you have not  
>>>> described your
>>>> reasoning at all. So I guess what is at error is your expectation  
>>>> that we
>>>> would have drawn all of the unstated inferences that you draw when
>>>> offered
>>>> the output from lrm. (I certainly did not draw the inference that  
>>>> "one
>>>> should discard the model".)
>>>>
>>>> resid is a function designed for use with glm and lm models. Why  
>>>> aren't
>>>> you
>>>>  using residuals.lrm?
>>>
>>> ----------------------------------------------------------
>>>>
>>>> residuals.lrm(model.lrm,'gof')
>>>
>>> Sum of squared errors     Expected value|H0                    SD
>>>        1.890393e+01          1.890393e+01          6.073415e-16
>>>                   Z                     P
>>>       -8.638125e+04          0.000000e+00
>>
>> Great. Now please answer the more fundamental question. Why do you  
>> think
>> this mean "discard the model"?
>
> Before answering that, let me tell you
>
> resid(model.lrm,'gof')
>
> calls residuals.lrm() -- so both approaches produce the same results.
> (See the examples given by ?residuals.lrm)
>
> To answer your question, I invoke the reasoning given by Frank  
> Harrell at:
>
> http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-td3508127.html
>
> He writes:
>
> «The test in the rms package's residuals.lrm function is the le Cessie
> - van Houwelingen - Copas - Hosmer unweighted sum of squares test for
> global goodness of fit.  Like all statistical tests, a large P-value
> has no information other than there was not sufficient evidence to
> reject the null hypothesis.  Here the null hypothesis is that the true
> probabilities are those specified by the model. »
>

How does that apply to your situation? You have a small (one might  
even say infinitesimal) p-value.

>> From Harrell's argument does not follow that if the p-value is zero
> one should reject the null hypothesis?

No, it doesn't follow at all, since that is not what he said. You are  
committing a common logical error. If A then B does _not_ imply If Not- 
A then Not-B.

> Please, correct if it is not
> correct what I say, and please direct me towards a way of establishing
> the goodness of fit of my model.

You need to state your research objectives and describe the science in  
your domain. They you need to describe your data gathering methods and  
your analytic process. Then there might be a basis for further comment.

-- 
David Winsemius, MD
West Hartford, CT