[R] How to calculate the area under the curve

Thu Oct 22 17:01:08 CEST 2009

olivier.abz wrote:
> Hi all, 
> 
> I would like to calculate the area under the ROC curve for my predictive
> model. I have managed to plot points giving me the ROC curve. However, I do
> not know how to get the value of the area under. 
> Does anybody know of a function that would give the result I want using an
> array of specificity and an array of sensitivity as input?
> 
> Thanks, 
> 
> Olivier

Olivier,

The ROC curves in my view just get in the way.  They are mainly useful 
in that, almost by accident, the area under the curve equals a nice pure 
discrimination index.  Go for the direct calculation of the ROC area 
based on the Wilcoxon-Mann-Whitney-Somers' Dxy rank correlation 
approach, e.g., using the Hmisc package rcorr.cens package which 
provides Dxy = 2(C-.5) where C = ROC area.  It also provides the S.E. of 
Dxy and thus of C, and generalizes to censored data.  This approach uses 
the raw data, not sensitivity and specificity (which are improper 
scoring rules).  This is assuming you are using an external validation 
dataset.  If this is not the case you will need to use the bootstrap or 
intensive cross-validation, e.g., using the rms package's lrm and 
validate functions.

Also note that it is not usually appropriate to compare two ROC areas 
for choosing a model as this is too insensitive.  It is the same as 
taking the difference between two scaled Wilcoxon statistics which is 
simply not done.

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University