[R] Question about ROCR package
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Sun Feb 8 16:27:58 CET 2009
Tobias Sing wrote:
> Waverley,
>
> you can also use perf at y.values to access the slot (see
> help(performance-class) for a description of the slots).
>
> You might also want have a look at the code for demo(ROCR) and at this
> slide deck:
> http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt
>
> HTH,
> Tobias
Tobias,
In my view there is one significant omission from your handout: high
resolution calibration curves. There is a need to show that predictive
models predict accurately. See for example the val.prob function in the
Design package. The many graphs related to cumulative probabilities
are nice, but in some ways they get in the way of the fundamental
elements of absolute accuracy (calibration curves) and predictive
descrimination (simple histogram of predicted probabilities ignoring Y).
I go into this in my 1996 Stat in Med paper. In my view the
continuous accuracy measures need to be examined first, because
dichotomizations provide only crude approximations to be plugged into
decision making. Dichotomizations (classifiers) may provide good
decisions for a group of subjects but not so good decisions for every
individual member of the group. For one thing, different group members
have different loss/utility functions. For another, a predicted
probability of 0.5 may often best be summarized as "collect another
predictor variable for this subject."
Related to this is that ROC-type measures result in a decision rule for
one subject that is a function of all the data of all the subjects in
the sample. This violates a basic principle of optimum Bayes decisions.
A related reference is below.
Just my $.02.
Frank
@Article{bri08ski,
author = {Briggs, William M. and Zaretzki, Russell},
title = {The skill plot: {A} graphical technique for evaluating
continuous diagnostic tests (with discussion)},
journal = Biometrics,
year = 2008,
volume = 63,
pages = {250-261},
annote = {ROC curve;sensitivity;skill plot;skill
score;specificity;diagnostic accuracy;diagnosis;``statistics such as the
AUC are not especially relevant to someone who must make a decision
about a particular $x_{c}$. \ldots ROC curves lack or obscure several
quantities that are necessary for evaluating the operational
effectiveness of diagnostic tests. \ldots ROC curves were first used to
check how radio \emph{receivers} (like radar receivers) operated over a
range of frequencies. \ldots This is not how must ROC curves are used
now, particularly in medicine. The receiver of a diagnostic measurement
\ldots wants to make a decision based on some $x_{c}$, and is not
especially interested in how well he would have done had he used some
different cutoff.''; in the discussion David Hand states ``when
integrating to yield the overall AUC measure, it is necessary to decide
what weight to give each value in the integration. The AUC implicitly
does this using a weighting derived empirically from the data. This is
nonsensical. The relative importance of misclassifying a case as a
noncase, compared to the reverse, cannot come from the data itself. It
must come externally, from considerations of the severity one attaches
to the different kinds of misclassifications.''}
}
>
> On Sat, Feb 7, 2009 at 10:40 PM, Jorge Ivan Velez
> <jorgeivanvelez at gmail.com> wrote:
>> Hi Waverley,
>> I forgot to tell you that "perf" is your performance object. Here is an
>> example from the ROCR package:
>> ## computing a simple ROC curve (x-axis: fpr, y-axis: tpr)
>> library(ROCR)
>> data(ROCR.simple)
>> pred <- prediction( ROCR.simple$predictions, ROCR.simple$labels)
>> perf <- performance(pred,"tpr","fpr")
>>
>> # y.values
>> unlist(slot(perf,"y.values"))
>>
>> HTH,
>>
>> Jorge
>>
>>
>>
>>> On Sat, Feb 7, 2009 at 3:17 PM, Waverley <waverley.paloalto at gmail.com>wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a question about ROCR package. I got the ROC curve plotted
>>>> without any problem following the manual. However, I don't know to
>>>> extract the values, e.g. y.values ( I think it is the area under the
>>>> curve auc measure). The return is an object of class "performance"
>>>> which have Slots and one of the slot is "y.values". I type the object
>>>> and I can see them in screen. But I want to extract the value for
>>>> further programming and computation. I did a summary of the object
>>>> and it is a "S4" mode which I don't understand.
>>>>
>>>> Can someone help?
>>>>
>>>> Thanks a lot in advance.
>>>>
>>>> --
>>>> Waverley @ Palo Alto
>>>>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list