[R] ROC optimal threshold

Fri Mar 31 18:20:33 CEST 2006

Michael Kubovy wrote:
> Hi Tim and José,
> 
> 
>>>Date: Fri, 31 Mar 2006 11:58:14 +0200
>>>From: "Anadon Herrera, Jose Daniel" <jdanadon at umh.es>
>>>Subject: [R] ROC optimal threshold
>>>
>>>I am using the ROC package to evaluate predictive models
>>>I have successfully plot the ROC curve, however
>>>
>>>?is there anyway to obtain the value of operating point=optimal  
>>>threshold
>>>value (i.e. the nearest point of the curve to the top-left corner  
>>>of the
>>>axes)?
> 
> 
> On Mar 31, 2006, at 8:01 AM, Tim Howard wrote:
> 
> 
>>I've struggled a bit with the same question, said another way: "how  
>>do you find the value in a ROC curve that minimizes false positives  
>>while maximizing true positives"?
>>
>>Here's something I've come up with. I'd be curious to hear from the  
>>list whether anyone thinks this code might get stuck in local  
>>minima, or if it does find the global minimum each time. (I think  
>>it's ok).
>>
>>
>>>From your ROC object you need to grab the sensitivity (=true  
>>>positive rate) and specificity (= 1- false positive rate) and the  
>>>cutoff levels.  Then find the value that minimizes abs(sensitivity- 
>>>specificity), or  sqrt((1-sens)^2)+(1-spec)^2)) as follows:
>>
>>absMin <- extract[which.min(abs(extract$sens-extract$spec)),];
>>sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract 
>>$spec)^2)),];
>>
>>In this example, 'extract' is a dataframe containing three columns:  
>>extract$sens = sensitivity values, extract$spec = specificity  
>>values, extract$votes = cutoff values. The command subsets the  
>>dataframe to a single row containing the desired cutoff and the  
>>sens and spec values that are associated with it.
>>
>>Most of the time these two answers (abs or sqrt) are the same,  
>>sometimes they differ quite a bit.
>>
>>I do not see this application of ROC curves very often. A question  
>>for those much more knowledgeable than I.... is there a problem  
>>with using ROC curves in this manner?
>>
>>Tim Howard
> 
> 
> @BOOK{MacmillanCreelman2005,
>    title = {Detection theory: {A} user's guide},
>    publisher = {Lawrence Erlbaum Associates},
>    year = {2005},
>    address = {Mahwah, NJ, USA},
>    edition = {2nd},
>    author = {Macmillan, Neil A and Creelman, C Douglas},
> }
> on p. 43 shows that the ideal value of the cutoff depends on the  
> reward function R that specifies the payoff for each outcome:
> \[
> LR(x) = \beta = \frac{R(true negative) - R{false positive)}{R(true  
> positive) - R(false negative)} \frac{p(noise)}{p(signal)}
> \]
> 
> I believe that your attempt to minimize false positives while  
> maximizing true positives amounts to maximizing the proportion of  
> correct answers. For that you just set $\beta = 0$. Otherwise it  
> might be best to explicitly state your costs and benefits by  
> specifying the reward function R.
> _____________________________
> Professor Michael Kubovy

Choosing cutoffs is frought with difficulties, arbitrariness, 
inefficiency, and the necessity to use a complex adjustment for multiple 
comparisons in later analysis steps unless the dataset used to generate 
the cutoff was so large as could be considered infinite.

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University