[R] Goodness of fit of binary logistic model

Fri Aug 5 20:40:43 CEST 2011

On Aug 5, 2011, at 2:29 PM, Paul Smith wrote:

> On Fri, Aug 5, 2011 at 7:07 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>>>>>> I have just estimated this model:
>>>>>>> -----------------------------------------------------------
>>>>>>> Logistic Regression Model
>>>>>>>
>>>>>>> lrm(formula = Y ~ X16, x = T, y = T)
>>>>>>>
>>>>>>>                  Model Likelihood     Discrimination    Rank  
>>>>>>> Discrim.
>>>>>>>                     Ratio Test            Indexes           
>>>>>>> Indexes
>>>>>>>
>>>>>>> Obs            82    LR chi2      5.58    R2       0.088    C
>>>>>>> 0.607
>>>>>>> 0             46    d.f.            1    g        0.488    Dxy
>>>>>>> 0.215
>>>>>>> 1             36    Pr(> chi2) 0.0182    gr       1.629    gamma
>>>>>>> 0.589
>>>>>>> max |deriv| 9e-11                         gp       0.107     
>>>>>>> tau-a
>>>>>>> 0.107
>>>>>>>                                       Brier    0.231
>>>>>>>
>>>>>>>       Coef    S.E.   Wald Z Pr(>|Z|)
>>>>>>> Intercept -1.3218 0.5627 -2.35  0.0188
>>>>>>> X16=1      1.3535 0.6166  2.20  0.0282
>>>>>>> -----------------------------------------------------------
>>>>>>>
>>>>>>> Analyzing the goodness of fit:
>>>>>>>
>>>>>>> -----------------------------------------------------------
>>>>>>>>
>>>>>>>> resid(model.lrm,'gof')
>>>>>>>
>>>>>>> Sum of squared errors     Expected value|H0                     
>>>>>>> SD
>>>>>>>      1.890393e+01          1.890393e+01          6.073415e-16
>>>>>>>                 Z                     P
>>>>>>>     -8.638125e+04          0.000000e+00
>>>>>>> -----------------------------------------------------------
>>>>>>>
>>>>>>>> From the above calculated p-value (0.000000e+00), one should  
>>>>>>>> discard
>>>>>>>
>>>>>>> this model. However, there is something that is puzzling me:  
>>>>>>> If the
>>>>>>> 'Expected value|H0' is so coincidental with the 'Sum of squared
>>>>>>> errors', why should one discard the model? I am certainly  
>>>>>>> missing
>>>>>>> something.
>>>>>>
>>>>>> It's hard to tell what you are missing, since you have not  
>>>>>> described
>>>>>> your
>>>>>> reasoning at all. So I guess what is at error is your  
>>>>>> expectation that
>>>>>> we
>>>>>> would have drawn all of the unstated inferences that you draw  
>>>>>> when
>>>>>> offered
>>>>>> the output from lrm. (I certainly did not draw the inference  
>>>>>> that "one
>>>>>> should discard the model".)
>>>>>>
>>>>>> resid is a function designed for use with glm and lm models.  
>>>>>> Why aren't
>>>>>> you
>>>>>>  using residuals.lrm?
>>>>>
>>>>> ----------------------------------------------------------
>>>>>>
>>>>>> residuals.lrm(model.lrm,'gof')
>>>>>
>>>>> Sum of squared errors     Expected value|H0                    SD
>>>>>       1.890393e+01          1.890393e+01          6.073415e-16
>>>>>                  Z                     P
>>>>>      -8.638125e+04          0.000000e+00
>>>>
>>>> Great. Now please answer the more fundamental question. Why do  
>>>> you think
>>>> this mean "discard the model"?
>>>
>>> Before answering that, let me tell you
>>>
>>> resid(model.lrm,'gof')
>>>
>>> calls residuals.lrm() -- so both approaches produce the same  
>>> results.
>>> (See the examples given by ?residuals.lrm)
>>>
>>> To answer your question, I invoke the reasoning given by Frank  
>>> Harrell at:
>>>
>>>
>>> http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-td3508127.html
>>>
>>> He writes:
>>>
>>> «The test in the rms package's residuals.lrm function is the le  
>>> Cessie
>>> - van Houwelingen - Copas - Hosmer unweighted sum of squares test  
>>> for
>>> global goodness of fit.  Like all statistical tests, a large P-value
>>> has no information other than there was not sufficient evidence to
>>> reject the null hypothesis.  Here the null hypothesis is that the  
>>> true
>>> probabilities are those specified by the model. »
>>>
>>
>> How does that apply to your situation? You have a small (one might  
>> even say
>> infinitesimal) p-value.
>>
>>
>>>> From Harrell's argument does not follow that if the p-value is zero
>>>
>>> one should reject the null hypothesis?
>>
>> No, it doesn't follow at all, since that is not what he said. You are
>> committing a common logical error. If A then B does _not_ imply If  
>> Not-A
>> then Not-B.
>>
>>> Please, correct if it is not
>>> correct what I say, and please direct me towards a way of  
>>> establishing
>>> the goodness of fit of my model.
>>
>> You need to state your research objectives and describe the science  
>> in your
>> domain. They you need to describe your data gathering methods and  
>> your
>> analytic process. Then there might be a basis for further comment.
>
> I will try to read the original paper where this goodness of fit test
> is proposed to clarify my doubts. In any case, in the paper
>
> @article{barnes2008model,
>  title={A model to predict outcomes for endovascular aneurysm repair
> using preoperative variables},
>  author={Barnes, M. and Boult, M. and Maddern, G. and Fitridge, R.},
>  journal={European Journal of Vascular and Endovascular Surgery},
>  volume={35},
>  number={5},
>  pages={571--579},
>  year={2008},
>  publisher={Elsevier}
> }
>
> it is written:
>
> «Table 5 lists the results of the global goodness of ﬁt test
> for each outcome model using the le Cessie-van Houwe-
> lingen-Copas-Hosmer unweighted sum of squares test.
> In the table a ‘good’ ﬁt is indicated by large p-values
> ( p > 0.05). Lack of ﬁt is indicated by low p-values
> ( p < 0.05). All p-values indicate that the outcome models
> have reasonable ﬁt, with the exception of the outcome
> model for conversion to open repairs ( p ¼ 0.04). The
> low p-value suggests a lack of ﬁt and it may be worth
> reﬁning the model for conversion to open repair.»
>
> In short, according to these authors, low p-values seem to suggest  
> lack of fit.

Sorry for the blank message.

So the topic is outcomes from surgery surgery? The gof approach to  
model assessment is just one way of looking at model comparison. The  
real question is not "is this the right fit", but should rather be  
"have I included as many relevant variables (for which I have data) as  
I need to". You included exactly one variable. That would imply that  
you had no prior knowledge about predictors of outcomes from surgery.  
On the the face of it seems highly implausible. Why are you even  
contemplating a gof test in such a situation? Notice that those  
authors said the effort should be made to "refine the model", not that  
it "should be discarded".

David Winsemius, MD
West Hartford, CT