[R] bias in AUCRF?

David Winsemius dwinsemius at comcast.net
Wed Nov 20 23:50:58 CET 2013


On Nov 20, 2013, at 12:44 PM, Jack Luo wrote:

> Hi,
> 
> I am using the AUCRF package for my data and I was firstly impressed by the
> high performance of OOB-AUC. But after a while, I feel it might be due to
> some sort of bias, which motivates me to use random data (generated using
> rnorm) for a test.
> 
> The design is very simple: 100 observations with 50 in class 0 and 50 in
> class 1. The number of variables is something I tuned (the main idea is
> that if there is bias, the performance should increase with more
> variables).
> 
> Presumably, there is no signal in the data and the true unbiased AUC should
> not be too different from 0.5.
> 
> The results are worrisome to me: the OOB AUC is a lot higher than 0.5, and
> with more variables, it gets even higher.
> 
> Am I misunderstanding anything here?
> 
> Below is the R code I used to test:
> 
> Nvar = 200  # number of variables
> Label = as.factor(c(rep(0,50),rep(1,50)))  # class label
> AUC_r = NULL
> 
> for (k in 1:10) {  # control the randomness of generating random data
>  set.seed(k)
>  Arandom = matrix(rnorm(Nvar*length(Label)),nc = Nvar)
>  DF = data.frame(Arandom,Label = Label)
>  for (j in 1:20) {  # control the randomness of OOB
>    if (j %% 10 == 0) {cat(k,j,"\n")}
>    set.seed(j)
>    fit <- AUCRF(Label~., data=DF)
>    AUC_r = cbind(AUC_r,fit$AUCcurve$AUC)
>  }
> }
> 
> plot(fit$AUCcurve$k,apply(AUC_r,1,mean),type = "b",pch = 3,xlab = "# of
> Vars", lwd = 2, col = 2,ylab = "OOB-AUC",ylim = c(0.4,1))
> 

Shouldn't this question go to the package maintainer before being sent to Rhelp?

> 
> Thanks,
> 
> -Jack
> 
> 	[[alternative HTML version deleted]]
And:
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html

-- 

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list