[R] AUC, C-index and p-value of Wilcoxon

Thu Feb 9 19:45:39 CET 2012

On Thu, Feb 09, 2012 at 06:33:09PM +0100, Petr Savicky wrote:
> On Thu, Feb 09, 2012 at 02:05:08PM +0000, linda Porz wrote:
> > Dear all,
> > 
> > I am using the ROCR library to compute the AUC and also the Hmisc library
> > to compute the C-index of a predictor and a group variable. The results of
> > AUC and C-index are similar and give a value of about 0.57. The Wilcoxon
> > p-value is <0.001! Why the AUC is showing small value and the p-value is
> > high significant? The AUC is based on Wilcoxon calculation?
> 
> Hi.
> 
> There is no direct relationship between AUC and p-value of 
> Wilcoxon. AUC measures, how well two distributions may be
> separated. The p-value measures, to which extent it is
> clear that the distributions are different. The test is
> significant, even if it is very clear that there is a tiny
> difference between the two distributions. This may happen
> for a large sample size. If the sample size increases,
> then AUC for separating variables X, Y converges to P(X < Y),
> which may be 0.57 and still, the p-value may converge to 0.

This effect may be demonstrated as follows. Try

  n <- 50
  for (i in 1:10) {
      x <- rnorm(n)
      y <- rnorm(n) + 0.25
      out <- wilcox.test(x, y, paired=FALSE)
      AUC <- 1 - out$statistic/n^2
      cat(AUC, out$p.value, "\n")
  }

The result may be

  0.6132 0.05147433 
  0.5396 0.4971117 
  0.54 0.492754 
  0.5444 0.446199 
  0.5528 0.3646515 
  0.5692 0.2343673 
  0.6168 0.044479 
  0.5748 0.1985487 
  0.5152 0.796007 
  0.5528 0.3646515 

The p-values are moderately significant or not
significant.

Try the same with n <- 1000

  0.564124 6.84383e-07 
  0.572522 1.953299e-08 
  0.575895 4.170283e-09 
  0.5651 4.623185e-07 
  0.584841 5.029007e-11 
  0.567354 1.829507e-07 
  0.601411 4.053585e-15 
  0.608903 3.356404e-17 
  0.583801 8.610077e-11 
  0.570637 4.497502e-08 

The AUC estimates have lower variance, but otherwise
have similar values. However, the p-values are now
small, since the sample is larger. With a larger sample,
the test is more sensitive and detects even a small
difference as a significant one.

Hope this helps.

Petr Savicky.