[R] Logistic regression model + precision/recall
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Wed Jan 24 15:59:44 CET 2007
nitin jindal wrote:
> On 1/24/07, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
>
>> Why 0.5?
>
>
> The probability has to adjusted based on some hit and trials. I just
> mentioned it as an example
Using a cutoff is not a good idea unless the utility (loss) function is
discontinuous and is the same for every subject (in the medical field
utilities are almost never constant). And if you are using the data to
find the cutoff, this will require bootstrapping to penalize for the
cutoff not being pre-specified.
>
>> Those are improper scoring rules that can be tricked. If the outcome is
>> rare (say 0.02 incidence) you could just predict that no one will have
>> the outcome and be correct 0.98 of the time. I suggest validating the
>> model for discrimination (e.g., AUC) and calibration.
>
>
> I just have to calculate precision/recall for rare outcome. If the positive
> outcome is rare ( say 0.02 incidence) and I predict it to be negative all
> the time, my recall would be 0, which is bad. So, precision and recall can
> take care of skewed data.
No, that is not clear. The overall classification error would only be
0.02 in that case. It is true though that one of the two conditional
probabilities would not be good.
>
> Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list