[R] Standard error for the area under a smoothed ROC curve?
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Wed Jan 12 14:18:39 CET 2005
Dan Bolser wrote:
> Hello,
>
> I am making some use of ROC curve analysis.
>
> I find much help on the mailing list, and I have used the Area Under the
> Curve (AUC) functions from the ROC function in the bioconductor project...
>
> http://www.bioconductor.org/repository/release1.5/package/Source/
> ROC_1.0.13.tar.gz
>
> However, I read here...
>
> http://www.medcalc.be/manual/mpage06-13b.php
>
> "The 95% confidence interval for the area can be used to test the
> hypothesis that the theoretical area is 0.5. If the confidence interval
> does not include the 0.5 value, then there is evidence that the laboratory
> test does have an ability to distinguish between the two groups (Hanley &
> McNeil, 1982; Zweig & Campbell, 1993)."
>
> But aside from early on the above article is short on details. Can anyone
> tell me how to calculate the CI of the AUC calculation?
>
>
> I read this...
>
> http://www.bioconductor.org/repository/devel/vignette/ROCnotes.pdf
>
> Which talks about resampling (by showing R code), but I can't understand
> what is going on, or what is calculated (the example given is specific to
> microarray analysis I think).
>
> I think a general AUC CI function would be a good addition to the ROC
> package.
>
>
>
>
> One more thing, in calculating the AUC I see the splines function is
> recomended over the approx function. Here...
>
> http://tolstoy.newcastle.edu.au/R/help/04/10/6138.html
>
> How would I rewrite the following AUC functions (adapted from bioconductor
> source) to use splines (or approxfun or splinefun) ...
>
>
>>spe # Specificity
>
> [1] 0.02173913 0.13043478 0.21739130 0.32608696 0.43478261 0.54347826
> [7] 0.65217391 0.76086957 0.89130435 1.00000000 1.00000000 1.00000000
> [13] 1.00000000
>
>
>>sen # Sensitivity
>
> [1] 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 0.9302326 0.8139535
> [8] 0.6976744 0.5581395 0.4418605 0.3488372 0.2325581 0.1162791
>
> trapezint(1-spe,sen)
> my.integrate(1-spe,sen)
>
> ## Functions
> ## Nicked (and modified) from the ROC function in bioconductor.
> "trapezint" <-
> function (x, y, a = 0, b = 1)
> {
> if (x[1] > x[length(x)]) {
> x <- rev(x)
> y <- rev(y)
> }
> y <- y[x >= a & x <= b]
> x <- x[x >= a & x <= b]
> if (length(unique(x)) < 2)
> return(NA)
> ya <- approx(x, y, a, ties = max, rule = 2)$y
> yb <- approx(x, y, b, ties = max, rule = 2)$y
> x <- c(a, x, b)
> y <- c(ya, y, yb)
> h <- diff(x)
> lx <- length(x)
> 0.5 * sum(h * (y[-1] + y[-lx]))
> }
>
> "my.integrate" <-
> function (x, y, t0 = 1)
> {
> f <- function(j) approx(x,y,j,rule=2,ties=max)$y
> integrate(f, 0, t0)$value
> }
>
>
>
>
>
> Thanks for any pointers,
> Dan.
I don't see why the above formulas are being used. The
Bamber-Hanley-McNeil-Wilcoxon-Mann-Whitney nonparametric method works
great. Just get the U statistic (concordance probability) used in
Wilcoxon. As Somers' Dxy rank correlation coefficient is 2*(1-C) where
C is the concordance or ROC area, the Hmisc package function rcorr.cens
uses U statistic methods to get the standard error of Dxy. You can
easily translate this to a standard error of C.
Frank
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list