[R] Goodness of fit of binary logistic model
Paul Smith
phhs80 at gmail.com
Fri Aug 5 20:29:54 CEST 2011
On Fri, Aug 5, 2011 at 7:07 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>> I have just estimated this model:
>>>>>> -----------------------------------------------------------
>>>>>> Logistic Regression Model
>>>>>>
>>>>>> lrm(formula = Y ~ X16, x = T, y = T)
>>>>>>
>>>>>> Model Likelihood Discrimination Rank Discrim.
>>>>>> Ratio Test Indexes Indexes
>>>>>>
>>>>>> Obs 82 LR chi2 5.58 R2 0.088 C
>>>>>> 0.607
>>>>>> 0 46 d.f. 1 g 0.488 Dxy
>>>>>> 0.215
>>>>>> 1 36 Pr(> chi2) 0.0182 gr 1.629 gamma
>>>>>> 0.589
>>>>>> max |deriv| 9e-11 gp 0.107 tau-a
>>>>>> 0.107
>>>>>> Brier 0.231
>>>>>>
>>>>>> Coef S.E. Wald Z Pr(>|Z|)
>>>>>> Intercept -1.3218 0.5627 -2.35 0.0188
>>>>>> X16=1 1.3535 0.6166 2.20 0.0282
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>> Analyzing the goodness of fit:
>>>>>>
>>>>>> -----------------------------------------------------------
>>>>>>>
>>>>>>> resid(model.lrm,'gof')
>>>>>>
>>>>>> Sum of squared errors Expected value|H0 SD
>>>>>> 1.890393e+01 1.890393e+01 6.073415e-16
>>>>>> Z P
>>>>>> -8.638125e+04 0.000000e+00
>>>>>> -----------------------------------------------------------
>>>>>>
>>>>>>> From the above calculated p-value (0.000000e+00), one should discard
>>>>>>
>>>>>> this model. However, there is something that is puzzling me: If the
>>>>>> 'Expected value|H0' is so coincidental with the 'Sum of squared
>>>>>> errors', why should one discard the model? I am certainly missing
>>>>>> something.
>>>>>
>>>>> It's hard to tell what you are missing, since you have not described
>>>>> your
>>>>> reasoning at all. So I guess what is at error is your expectation that
>>>>> we
>>>>> would have drawn all of the unstated inferences that you draw when
>>>>> offered
>>>>> the output from lrm. (I certainly did not draw the inference that "one
>>>>> should discard the model".)
>>>>>
>>>>> resid is a function designed for use with glm and lm models. Why aren't
>>>>> you
>>>>> using residuals.lrm?
>>>>
>>>> ----------------------------------------------------------
>>>>>
>>>>> residuals.lrm(model.lrm,'gof')
>>>>
>>>> Sum of squared errors Expected value|H0 SD
>>>> 1.890393e+01 1.890393e+01 6.073415e-16
>>>> Z P
>>>> -8.638125e+04 0.000000e+00
>>>
>>> Great. Now please answer the more fundamental question. Why do you think
>>> this mean "discard the model"?
>>
>> Before answering that, let me tell you
>>
>> resid(model.lrm,'gof')
>>
>> calls residuals.lrm() -- so both approaches produce the same results.
>> (See the examples given by ?residuals.lrm)
>>
>> To answer your question, I invoke the reasoning given by Frank Harrell at:
>>
>>
>> http://r.789695.n4.nabble.com/Hosmer-Lemeshow-goodness-of-fit-td3508127.html
>>
>> He writes:
>>
>> «The test in the rms package's residuals.lrm function is the le Cessie
>> - van Houwelingen - Copas - Hosmer unweighted sum of squares test for
>> global goodness of fit. Like all statistical tests, a large P-value
>> has no information other than there was not sufficient evidence to
>> reject the null hypothesis. Here the null hypothesis is that the true
>> probabilities are those specified by the model. »
>>
>
> How does that apply to your situation? You have a small (one might even say
> infinitesimal) p-value.
>
>
>>> From Harrell's argument does not follow that if the p-value is zero
>>
>> one should reject the null hypothesis?
>
> No, it doesn't follow at all, since that is not what he said. You are
> committing a common logical error. If A then B does _not_ imply If Not-A
> then Not-B.
>
>> Please, correct if it is not
>> correct what I say, and please direct me towards a way of establishing
>> the goodness of fit of my model.
>
> You need to state your research objectives and describe the science in your
> domain. They you need to describe your data gathering methods and your
> analytic process. Then there might be a basis for further comment.
I will try to read the original paper where this goodness of fit test
is proposed to clarify my doubts. In any case, in the paper
@article{barnes2008model,
title={A model to predict outcomes for endovascular aneurysm repair
using preoperative variables},
author={Barnes, M. and Boult, M. and Maddern, G. and Fitridge, R.},
journal={European Journal of Vascular and Endovascular Surgery},
volume={35},
number={5},
pages={571--579},
year={2008},
publisher={Elsevier}
}
it is written:
«Table 5 lists the results of the global goodness of fit test
for each outcome model using the le Cessie-van Houwe-
lingen-Copas-Hosmer unweighted sum of squares test.
In the table a ‘good’ fit is indicated by large p-values
( p > 0.05). Lack of fit is indicated by low p-values
( p < 0.05). All p-values indicate that the outcome models
have reasonable fit, with the exception of the outcome
model for conversion to open repairs ( p ¼ 0.04). The
low p-value suggests a lack of fit and it may be worth
refining the model for conversion to open repair.»
In short, according to these authors, low p-values seem to suggest lack of fit.
Paul
More information about the R-help
mailing list