[R] Can't reproduce ada example
Bob Flagg
bob at calcworks.net
Thu Jul 7 19:58:21 CEST 2011
Dear R Users,
I'm having trouble reproducing the results in Section 5.1 of
Culp, M., Johnson, K., Michailidis, G. (2006). ada: an R Package for
Stochastic Boosting Journal of Statistical Software, 16
They build and display a boosting model with the code:
library("ada")
n <- 12000
p <- 10
set.seed(100)
x <- matrix(rnorm(n*p), ncol=p)
y <- as.factor(c(-1,1)[as.numeric(apply(x^2, 1, sum) > 9.34) + 1])
indtrain <- sample(1:n, 2000, FALSE)
train <- data.frame(y=y[indtrain], x[indtrain,])
test <- data.frame(y=y[-indtrain], x[-indtrain,])
control <- rpart.control(cp = -1,minsplit = 0,xval = 0,maxdepth = 1)
gdis <- ada(y~., data = train, iter = 400, bag.frac = 1, nu = 1,
control = control, test.x = test[,-1], test.y = test[,1])
gdis
plot(gdis, TRUE, TRUE)
summary(gdis, n.iter = 398)
My problem is that my confusion matrix, testing results and diagnostic
plots differ from what is given in the paper. My confusion matrix is
Final Confusion Matrix for Data:
Final Prediction
True value 1 -1
1 925 85
-1 36 954
but the paper gives
Final Confusion Matrix for Data:
Final Prediction
True value -1 1
-1 954 36
1 85 925
My Testing Results are
Accuracy: 0.111 Kappa: -0.777
but the paper has Testing Results
Accuracy: 0.889 Kappa: 0.777
In the diagnostic plots my test curves seem to
be plotting (1-Error).
I can make the testing results and diagnostic plots match up if I
interchange labels in the test.y data:
gdis <- ada(y~., data = train, iter = 400, bag.frac = 1, nu = 1,
control = control, test.x = test[,-1], test.y =
ifelse(test[,1]==1,-1,1))
but I don't understand why that should work.
Any help you can provide will be much appreciated.
Thanks,
Bob
More information about the R-help
mailing list