[R-sig-eco] help with categorical responses in boosted classification trees (gbm package)

Jill Johnstone jill.johnstone at usask.ca
Fri Oct 10 18:41:06 CEST 2008


Hello,

I am working on developing code for a boosted classification tree that
predicts membership within 4 non-ordered classes, using the gbm or gbmplus
packages in R. I've been successful (I think) in using this package
successfully for regression trees, where the response is numeric. However,
I'm running into problems setting up a boosted tree for a categorical
response that is not simply a 0,1 response. In my case, the response is a
non-ordered factor that represents different vegetation community types.
There are 4 factor levels and n=90 for the dataset.

I think the problem may be that I am not specifying a proper error
distribution. GBM help specifies the following options for this: 

"..."gaussian" (squared error), "laplace" (absolute loss), "bernoulli"
(logistic regression for 0-1 outcomes), "adaboost" (the AdaBoost exponential
loss for 0-1 outcomes), "poisson" (count outcomes), and "coxph" (censored
observations)."

I believe that the Gaussian error distribution is most appropriate for these
data, and this is what I've been using. Below is the code that I am running:

tree1 <- gbm(veg ~ lat+elev+moist.class+BA.stnd+pre.decid,
    data = natseed, n.tree=900, int=3, n.minobsinnode=5,     
    distribution="gaussian", shrinkage=0.003, 
    bag.fraction=0.5, cv.folds=5)
all.summary(tree.1)

And the error I am currently getting specifies a problem with the
cross-validation, but I am not sure how to interpret this:
"Error in if (x[[1]]$type != "cv") stop("Not a CV tree !!\n") : argument is
of length zero"

I'd really appreciate suggestions about where I might be going wrong, if
anyone has any. I've been able to run this successfully as a regular
classification tree using the "tree" library, but had hoped to apply the
boosting approach. I've been referring to two excellent ecological papers
that describe this technique, but neither deals with this type of
classification tree:
1. De'ath, G. 2007. Boosted trees for ecological modeling and prediction.
Ecology 88: 243-251.
2. Elith, J., Leathwick, J.R., and Hastie, T. 2008. A working guide to
boosted regression trees. J. Animal Ecol. 77: 802-813.

Thanks in advance for any suggestions.

Jill Johnstone
assistant professor
Department of Biology
University of Saskatchewan
112 Science Place
Saskatoon SK S7N 5E2
ph:(306)966-4421  fax:966-4461
website: www.usask.ca/biology/johnstone/



More information about the R-sig-ecology mailing list