[R] rpart: apply tree to new data to get "counts"
Stephen Milborrow
milbo at sonic.net
Tue Aug 30 20:14:45 CEST 2011
Jay <josip.2000 at gmail.com> het geskryf
> When I have made a decision tree with rpart, is it possible to "apply"
> this tree to a new set of data in order to find out the distribution
> of observations? Ideally I would like to plot my original tree, with
> the counts (at each node) of the new data.
Sadly, neither plot.rpart or rpart.plot support plotting a tree trained on
one set of data but showing results predicted for a new set of data. Page
21 of the vignette for the rpart.plot package has this to say
"Arguably the most serious limitation of the current implementation is its
inability to display results on test data (on the tree derived from the
training data)."
One way of implementing this (quite a lot of work) would be to extend the
rpart function to include a newdata argument. If given such an argument,
rpart would additionally return new.frame, new.where, and new.y fields
(corresponding to the existing frame, where, and y fields). The plotting
functions could then trivially be extended to use these new fields.
More information about the R-help
mailing list