[R] ROC optimal threshold
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Fri Mar 31 18:20:33 CEST 2006
Michael Kubovy wrote:
> Hi Tim and José,
>
>
>>>Date: Fri, 31 Mar 2006 11:58:14 +0200
>>>From: "Anadon Herrera, Jose Daniel" <jdanadon at umh.es>
>>>Subject: [R] ROC optimal threshold
>>>
>>>I am using the ROC package to evaluate predictive models
>>>I have successfully plot the ROC curve, however
>>>
>>>?is there anyway to obtain the value of operating point=optimal
>>>threshold
>>>value (i.e. the nearest point of the curve to the top-left corner
>>>of the
>>>axes)?
>
>
> On Mar 31, 2006, at 8:01 AM, Tim Howard wrote:
>
>
>>I've struggled a bit with the same question, said another way: "how
>>do you find the value in a ROC curve that minimizes false positives
>>while maximizing true positives"?
>>
>>Here's something I've come up with. I'd be curious to hear from the
>>list whether anyone thinks this code might get stuck in local
>>minima, or if it does find the global minimum each time. (I think
>>it's ok).
>>
>>
>>>From your ROC object you need to grab the sensitivity (=true
>>>positive rate) and specificity (= 1- false positive rate) and the
>>>cutoff levels. Then find the value that minimizes abs(sensitivity-
>>>specificity), or sqrt((1-sens)^2)+(1-spec)^2)) as follows:
>>
>>absMin <- extract[which.min(abs(extract$sens-extract$spec)),];
>>sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract
>>$spec)^2)),];
>>
>>In this example, 'extract' is a dataframe containing three columns:
>>extract$sens = sensitivity values, extract$spec = specificity
>>values, extract$votes = cutoff values. The command subsets the
>>dataframe to a single row containing the desired cutoff and the
>>sens and spec values that are associated with it.
>>
>>Most of the time these two answers (abs or sqrt) are the same,
>>sometimes they differ quite a bit.
>>
>>I do not see this application of ROC curves very often. A question
>>for those much more knowledgeable than I.... is there a problem
>>with using ROC curves in this manner?
>>
>>Tim Howard
>
>
> @BOOK{MacmillanCreelman2005,
> title = {Detection theory: {A} user's guide},
> publisher = {Lawrence Erlbaum Associates},
> year = {2005},
> address = {Mahwah, NJ, USA},
> edition = {2nd},
> author = {Macmillan, Neil A and Creelman, C Douglas},
> }
> on p. 43 shows that the ideal value of the cutoff depends on the
> reward function R that specifies the payoff for each outcome:
> \[
> LR(x) = \beta = \frac{R(true negative) - R{false positive)}{R(true
> positive) - R(false negative)} \frac{p(noise)}{p(signal)}
> \]
>
> I believe that your attempt to minimize false positives while
> maximizing true positives amounts to maximizing the proportion of
> correct answers. For that you just set $\beta = 0$. Otherwise it
> might be best to explicitly state your costs and benefits by
> specifying the reward function R.
> _____________________________
> Professor Michael Kubovy
Choosing cutoffs is frought with difficulties, arbitrariness,
inefficiency, and the necessity to use a complex adjustment for multiple
comparisons in later analysis steps unless the dataset used to generate
the cutoff was so large as could be considered infinite.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list