[R] user-written splits in rpart

Verspagen B (ALGEC) b.verspagen at maastrichtuniversity.nl
Thu Dec 17 20:15:22 CET 2009


--- begin original message ---
I am trying to write my own split function for rpart. The aim is to do,
instead of anova, a linear regression to determine the split (minimize
some criterion like sum of rss left and right of the split). The
regression (lm) should simply use the dependent and independent
variables passed to rpart.

I am aware of the example provided in the rpart source code, but
stumbled on similar problems that I saw reported on this list (no final
solution posted, as far as I could see). The problem is, broadly
speaking, that I do not see a way to access the full set of x and y
variables in the user-written split-function.
---- end original message -----------

---- begin reply -----------
The rpart routine provides the x variables to a user-written split
function one at a time.  Since the entire structure of rpart --
printing, plotting, tree representation, etc --- is based on the premise
of a single variable driving each split,  what you are asking for would
require an entirely different program.
---- end reply -----------

I have been reflecting a little bit on this. Perhaps I have been unclear. 
It is fine with me that a single variable "drives" the split, i.e., that we partition the data based on the ranking of a single variable. I am not looking to change that. 
What I would like to change is the evaluation of that split. In the anova implementation of rpart, this evaluation is now done by taking the average of the dependent variable (left and right of the split) as a predictor. Instead of that, I would like to run a linear regression (left and right of the split) involving all x-variabels that were supplied in the call to rpart, and take the predictive power of those regressions as the evaluation of the split.
I can't see how that would change the structure of the entire module.

Best regards,

Bart Verspagen




More information about the R-help mailing list