[R] Logistic Regression - Interpreting SENS (Sensitivity) and SPEC (Specificity)

Mon Oct 13 09:52:30 CEST 2008

On Mon, 13 Oct 2008, Peter Dalgaard wrote:

> Dieter Menne wrote:
>> Maithili Shiva <maithili_shiva <at> yahoo.com> writes:
>> 
>>> I havd main sample of 42500 clentes and
>>> based on their status as regards to defaulted / non - defaulted, I have
>> genereted the probability of default.
>>> I have a hold out sample of 5000 clients. I have calculated (1) No of
>> correctly classified goods Gg, (2) No of
>>> correcly classified Bads Bg and also (3) number of wrongly classified bads
>> (Gb) and (4) number of wrongly
>>> classified goods (Bg).
>> 
>> The simple and wrong answer is to use these data directly to compute 
>> sensitivity
>> (fraction of hits). This measure is useless, but I encounter it often in 
>> medical
>> publications.
>> 
>> You can get a more reasonable answer by using cross-validation. Check, for
>> example, Frank Harrell's 
>> http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
>
> But if he has a "hold out sample", isn't he already cross-validating??  I 
> wonder if you're answering the right question there. Could he just be looking 
> for Sp=Gg/(Gg+Bg), Se=Bb/(Gb+Bb)? (If I got the notation right.)

Strictly no, she is 'validating' (no cross- involved).  Cross-validation 
would be a better idea for much smaller sample sizes (we don't know how 
many regressors are involved, so say hundreds unless there are more than 
ten regressors).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595