[R] Standard error for the area under a smoothed ROC curve?

Wed Jan 12 14:18:39 CET 2005

Dan Bolser wrote:
> Hello, 
> 
> I am making some use of ROC curve analysis. 
> 
> I find much help on the mailing list, and I have used the Area Under the
> Curve (AUC) functions from the ROC function in the bioconductor project...
> 
> http://www.bioconductor.org/repository/release1.5/package/Source/
> ROC_1.0.13.tar.gz 
> 
> However, I read here...
> 
> http://www.medcalc.be/manual/mpage06-13b.php
> 
> "The 95% confidence interval for the area can be used to test the
> hypothesis that the theoretical area is 0.5. If the confidence interval
> does not include the 0.5 value, then there is evidence that the laboratory
> test does have an ability to distinguish between the two groups (Hanley &
> McNeil, 1982; Zweig & Campbell, 1993)."
> 
> But aside from early on the above article is short on details. Can anyone
> tell me how to calculate the CI of the AUC calculation?
> 
> 
> I read this...
> 
> http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf
> 
> Which talks about resampling (by showing R code), but I can't understand
> what is going on, or what is calculated (the example given is specific to
> microarray analysis I think).
> 
> I think a general AUC CI function would be a good addition to the ROC
> package.
> 
> 
> 
> 
> One more thing, in calculating the AUC I see the splines function is
> recomended over the approx function. Here...
> 
> http://tolstoy.newcastle.edu.au/R/help/04/10/6138.html
> 
> How would I rewrite the following AUC functions (adapted from bioconductor
> source) to use splines (or approxfun or splinefun) ...
> 
> 
>>spe # Specificity
> 
>  [1] 0.02173913 0.13043478 0.21739130 0.32608696 0.43478261 0.54347826
>  [7] 0.65217391 0.76086957 0.89130435 1.00000000 1.00000000 1.00000000
> [13] 1.00000000
> 
> 
>>sen # Sensitivity
> 
>  [1] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.9302326 0.8139535
>  [8] 0.6976744 0.5581395 0.4418605 0.3488372 0.2325581 0.1162791
> 
> trapezint(1-spe,sen)
> my.integrate(1-spe,sen)
> 
> ## Functions
> ## Nicked (and modified) from the ROC function in bioconductor.
> "trapezint" <-
> function (x, y, a = 0, b = 1)
> {
>     if (x[1] > x[length(x)]) {
>       x <- rev(x)
>       y <- rev(y)
>     }
>     y <- y[x >= a & x <= b]
>     x <- x[x >= a & x <= b]
>     if (length(unique(x)) < 2)
>         return(NA)
>     ya <- approx(x, y, a, ties = max, rule = 2)$y
>     yb <- approx(x, y, b, ties = max, rule = 2)$y
>     x <- c(a, x, b)
>     y <- c(ya, y, yb)
>     h <- diff(x)
>     lx <- length(x)
>     0.5 * sum(h * (y[-1] + y[-lx]))
> }
> 
> "my.integrate" <-
> function (x, y, t0 = 1)
> {
>     f <- function(j) approx(x,y,j,rule=2,ties=max)$y
>     integrate(f, 0, t0)$value
> }
> 
> 
> 
> 
> 
> Thanks for any pointers,
> Dan.

I don't see why the above formulas are being used.  The 
Bamber-Hanley-McNeil-Wilcoxon-Mann-Whitney nonparametric method works 
great.  Just get the U statistic (concordance probability) used in 
Wilcoxon.  As Somers' Dxy rank correlation coefficient is 2*(1-C) where 
C is the concordance or ROC area, the Hmisc package function rcorr.cens 
uses U statistic methods to get the standard error of Dxy.  You can 
easily translate this to a standard error of C.

Frank

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University