[BioC] Seeking assistance on ROC

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Feb 1 18:44:23 CET 2010


It's probably a mistake ... but I feel compelled to just try to add some input.

Susan: forget about the code that's necessary to do "ROC Analysis"
with R, as Sean mentioned, perhaps you should look to see if ROC
analysis is what you want.

The thing that seems most painful to me is that you previously wrote this:

"""
I tried with a range of thresholds from 0-0.9..As you had mentioned,the
true positive rates no doubt increased with thresholds below
0.9.However I did get some false positive rates even at a minimum threshold
of 0.1.Could you kindly explain the reason?

Is there any method of finding the optimal threshold,maximizing the true
positive rates while minimizing the false positives,instead of randomly
choosing between 0-0.9?
"""

And, again, as Sean mentioned, that's what a ROC curve is for.

So:

1. Presumably you have some binary classifier that is classifying
something of interest.
2. This classifier has some parameter(s) you can tune that adjusts
it's sensitivity vs. specificity tradeoff.
3. You want to determine the optimal value of this parameter for your
classifier that gives you the best trade off.
4. Let's assume you vary this parameter over MANY values. You can now
plot the sensitivity vs. specifity (1 - specificity to be precise) of
your classifier for all of these values to see *visually* what the
tradeoff is.
5. Assuming you're plotting this sensitivity vs. specificity point for
all of the values of your parameter ON THE SAME GRAPH, when you squint
your eyes enough, the shape that you will see emerging from your plot
is a curve. Your job is to find "the best" point on this curve.
6. You just have to find the top-left-most point on this curve: this
is your best value for the parameter since it gives you the best
combination of sensitivity (as high as possible on the y -axis) and
specificity (as left as possible on the x axis, since x = 1 -
specificity).
7. Finally, since you made this plot, you know which value of your
parameter was used to create all the points on your curve. Just take
the value of the parameter that gives you the point on the curve you
found in (6).

Sorry, I'm not really inclined to provide any code that does this for
you ... or walk you through a tutorial of any package that does this
either. It's actually pretty straight forward to do so yourself
without using any R packages and just using "straight R".

Read through the ROC page on wikipedia, the intro and basic concept
really tells you all you need to know, and that (along with my surely
lucid list of points above :-) should help you decide if "ROC
analysis" is what you're after:

http://en.wikipedia.org/wiki/Receiver_operating_characteristic

All the formulas you need are there (really just the true positive
rate, false positive rate). Assuming you have the known labels for a
set of data that you are using your classifier to predict on, you
should be able to whip up some R code that makes the ROC plot with
some effort.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list