[R] How to calculate the area under the curve
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Thu Oct 22 17:01:08 CEST 2009
olivier.abz wrote:
> Hi all,
>
> I would like to calculate the area under the ROC curve for my predictive
> model. I have managed to plot points giving me the ROC curve. However, I do
> not know how to get the value of the area under.
> Does anybody know of a function that would give the result I want using an
> array of specificity and an array of sensitivity as input?
>
> Thanks,
>
> Olivier
Olivier,
The ROC curves in my view just get in the way. They are mainly useful
in that, almost by accident, the area under the curve equals a nice pure
discrimination index. Go for the direct calculation of the ROC area
based on the Wilcoxon-Mann-Whitney-Somers' Dxy rank correlation
approach, e.g., using the Hmisc package rcorr.cens package which
provides Dxy = 2(C-.5) where C = ROC area. It also provides the S.E. of
Dxy and thus of C, and generalizes to censored data. This approach uses
the raw data, not sensitivity and specificity (which are improper
scoring rules). This is assuming you are using an external validation
dataset. If this is not the case you will need to use the bootstrap or
intensive cross-validation, e.g., using the rms package's lrm and
validate functions.
Also note that it is not usually appropriate to compare two ROC areas
for choosing a model as this is too insensitive. It is the same as
taking the difference between two scaled Wilcoxon statistics which is
simply not done.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list