[R] questions on rpart (tree changes when rearrange the order of covariates)
therneau at mayo.edu
Wed May 13 14:26:33 CEST 2009
If two variables have exactly the same split importance, then rpart will use
the one that was first in the model statement. So if
rpart(group ~ age + height + weight + sex)
and at some split point both age and weight gave a split with 20 correct and 9
incorrect, then age would be used to split at that node.
Even though the error of the age and weight splits are the same, the set of 9
subjects that were incorrect may be different, i.e., they don't send exactly the
same observations to the left and the right. Thus, the rest of the tree from
that point on may be different, giving a different fit.
For continuous y this rarely happens -- that two splits have exactly the same
R^2 -- but it is not uncommon in classification problems.
More information about the R-help