[R] Some questions on Rpart algorithm
Marcus, Jeffrey
Jeffrey.Marcus at nuance.com
Tue Oct 17 16:03:04 CEST 2006
Hello:
I am using rpart and would like more background on how the splits are made
and how to interpret results - also how to properly use text(.rpart). I have
looked through Venables and Ripley and through the rpart help and still have
some questions. If there is a source (say, Breiman et al) on decision trees
that would clear this all up, please let me know. The questions below
pertain to a classification task (ie., I'm using the "class" method). Many
thanks in advance.
(1) I'd like text(.rpart) to print percentages of each class rather then
counts. I don't see an option for this so would like to modify the
text.rpart. However, I can't find the source since it is a method that's
"hidden". How can I find the source?
(2) printcp prints a table with columns cp, nsplit, rel error, xerror, xstd.
I am guessing that cp is complexity, nsplit is the number of the split, rel
error is the error on test set, xerror is cross-validation error and xstd is
standard deviation of error across the cross-validation sets. Is there any
documentation on this? For instance, how exactly is complexity computed?
(3) What's a "loss matrix?" Is it the cost place on each type of
misclassification?
(4) [More of a methodology question] In practice, when would one use
different costs on different splitting variables?
Thanks for any help on this.
Jeff
More information about the R-help
mailing list