[R] Can't reproduce ada example

Bob Flagg bob at calcworks.net
Thu Jul 7 19:58:21 CEST 2011


Dear R Users,

I'm having trouble reproducing the results in Section 5.1 of

Culp, M., Johnson, K., Michailidis, G. (2006). ada: an R Package for
Stochastic Boosting Journal of Statistical Software, 16

They build and display a boosting model with the code:

library("ada")
n <- 12000
p <- 10
set.seed(100)
x <- matrix(rnorm(n*p), ncol=p)
y <- as.factor(c(-1,1)[as.numeric(apply(x^2, 1, sum) > 9.34) + 1])
indtrain <- sample(1:n, 2000, FALSE)
train <- data.frame(y=y[indtrain], x[indtrain,])
test <- data.frame(y=y[-indtrain], x[-indtrain,])
control <- rpart.control(cp = -1,minsplit = 0,xval = 0,maxdepth = 1)
gdis <- ada(y~., data = train, iter = 400, bag.frac = 1, nu = 1,
control = control, test.x = test[,-1], test.y = test[,1])
gdis
plot(gdis, TRUE, TRUE)
summary(gdis, n.iter = 398)

My problem is that my confusion matrix, testing results and diagnostic
plots differ from what is given in the paper.  My confusion matrix is

Final Confusion Matrix for Data:
          Final Prediction
True value   1  -1
        1  925  85
        -1  36 954

but the paper gives 

Final Confusion Matrix for Data:
          Final Prediction
True value   -1  1
        -1  954  36
        1   85   925

My Testing Results are

Accuracy: 0.111 Kappa: -0.777 

but the paper has Testing Results

Accuracy: 0.889 Kappa: 0.777

In the diagnostic plots my test curves seem to 
be plotting (1-Error).  

I can make the testing results and diagnostic plots match up if I
interchange labels in the test.y data:

gdis <- ada(y~., data = train, iter = 400, bag.frac = 1, nu = 1,
control = control, test.x = test[,-1], test.y =
ifelse(test[,1]==1,-1,1))

but I don't understand why that should work.

Any help you can provide will be much appreciated.  

Thanks,
Bob



More information about the R-help mailing list