[R] ROC optimal threshold
ramasamy at cancer.org.uk
Fri Mar 31 17:42:32 CEST 2006
If you define a cost function for a given threshold k as
cost(k) = FP(k) + lambda * FN(k)
then choose k that minimises cost. FP and FN are false positives and
false negatives at threshold k.
You change lambda to a value greater than 1 if you want to penalise FN
more than FP. There are many situations where this is desirable. For
example when you have highly unbalanced class sizes. For example
consider a problem where you want to predict rare events and you will be
penalised much more heavily if you miss an event than a non-event.
I believe the ROC was designed to compare two methods over a range of
thresholds and not for choosing the threshold itself.
On Fri, 2006-03-31 at 08:01 -0500, Tim Howard wrote:
> Jose -
> I've struggled a bit with the same question, said another way: "how do you find the value in a ROC curve that minimizes false positives while maximizing true positives"?
> Here's something I've come up with. I'd be curious to hear from the list whether anyone thinks this code might get stuck in local minima, or if it does find the global minimum each time. (I think it's ok).
> >From your ROC object you need to grab the sensitivity (=true positive rate) and specificity (= 1- false positive rate) and the cutoff levels. Then find the value that minimizes abs(sensitivity-specificity), or sqrt((1-sens)^2)+(1-spec)^2)) as follows:
> absMin <- extract[which.min(abs(extract$sens-extract$spec)),];
> sqrtMin <- extract[which.min(sqrt((1-extract$sens)^2+(1-extract$spec)^2)),];
> In this example, 'extract' is a dataframe containing three columns: extract$sens = sensitivity values, extract$spec = specificity values, extract$votes = cutoff values. The command subsets the dataframe to a single row containing the desired cutoff and the sens and spec values that are associated with it.
> Most of the time these two answers (abs or sqrt) are the same, sometimes they differ quite a bit.
> I do not see this application of ROC curves very often. A question for those much more knowledgeable than I.... is there a problem with using ROC curves in this manner?
> Tim Howard
> Date: Fri, 31 Mar 2006 11:58:14 +0200
> From: "Anadon Herrera, Jose Daniel" <jdanadon at umh.es>
> Subject: [R] ROC optimal threshold
> To: "'r-help at stat.math.ethz.ch'" <r-help at stat.math.ethz.ch>
> <79C6D1A4DD5E7B46B663C43C0021236556F66D at mailer-e071.umh.es>
> Content-Type: text/plain; charset=iso-8859-1
> I am using the ROC package to evaluate predictive models
> I have successfully plot the ROC curve, however
> ?is there anyway to obtain the value of operating point=optimal threshold
> value (i.e. the nearest point of the curve to the top-left corner of the
> thank you very much,
> jose daniel anadon
> area de ecologia
> universidad miguel hernandez
> R-help at stat.math.ethz.ch mailing list
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help