[R] MART(tm) vs. gbm
manuel.martin
manuel.martin at orleans.inra.fr
Fri May 26 16:31:04 CEST 2006
Hello,
I have been using two different implementations of the stochastic
gradient boosting (Friedman 2002) : MART(tm) with R and the gbm package.
Both are fairly comparable except that the MART with R systematically
strongly (depending on the dataset though) outperforms the gbm tool in
terms of goodness of fit.
For instance, a
# gbm package
gbm1 <- gbm(Y~X2+X3+X4+X5+X6,
data=data,
var.monotone=c(0,0,0,0,0), # 0: no monotone restrictions
distribution="gaussian", # bernoulli, adaboost, gaussian,
# poisson, and coxph available
n.trees=3000, # number of trees
shrinkage=0.005, # shrinkage or learning rate,
# 0.001 to 0.1 usually work
interaction.depth=6, # 1: additive model, 2: two-way
interactions, etc.
bag.fraction = 0.5, # subsampling fraction, 0.5 is
probably best
train.fraction = 0.5, # fraction of data for training,
# first train.fraction*N used for
training
n.minobsinnode = 10, # minimum total weight needed in
each node
cv.folds = 5, # do 5-fold cross-validation
keep.data=TRUE, # keep a copy of the dataset with
the object
verbose=TRUE) # print out progress
# MART with R
X <- as.matrix(cbind(data$X2,as.numeric(data$X3),
as.numeric(data$X4),as.numeric(data$X5),data$X6))
Y <- data$Y
mart(X, Y, c(1,2,2,2,1) , niter=3000, tree.size=6, learn.rate=0.005,
loss.cri=2 #gaussian too
)
leads to very different goodnesses of fit (I can provide the dataset if
needed).
Did anyone already encountered this, is there an explanation, am I
missing something obvious in the argument settings?
Thank you in advance,
Manuel
More information about the R-help
mailing list