[R] tree model with at most one split point per variable
Gabor Grothendieck
ggrothendieck at myway.com
Thu Jun 24 16:08:57 CEST 2004
I would like to create a tree model with at most one split point per variable
using tree, rpart or other routine. Its OK if a variable enters at more
than one node but if it does then all splits for that variable should be
at the same point. The idea is that I want to be able to summarize the
data as binary factors with the chosen split points. I don't want to
have three level or more factors.
For example, the following shows that the first split is with Petal.Length
splitting at 2.45; however, there are other splits of Petal.Length at
4.95. I want to disallow that.
R> data(iris)
R> tree(Species ~., data = iris)
node), split, n, deviance, yval, (yprob)
* denotes terminal node
1) root 150 329.600 setosa ( 0.33333 0.33333 0.33333 )
2) Petal.Length < 2.45 50 0.000 setosa ( 1.00000 0.00000 0.00000 ) *
3) Petal.Length > 2.45 100 138.600 versicolor ( 0.00000 0.50000 0.50000 )
6) Petal.Width < 1.75 54 33.320 versicolor ( 0.00000 0.90741 0.09259 )
12) Petal.Length < 4.95 48 9.721 versicolor ( 0.00000 0.97917
0.02083 )
24) Sepal.Length < 5.15 5 5.004 versicolor ( 0.00000 0.80000
0.20000 ) *
25) Sepal.Length > 5.15 43 0.000 versicolor ( 0.00000 1.00000
0.00000 ) *
13) Petal.Length > 4.95 6 7.638 virginica ( 0.00000 0.33333 0.66667 ) *
7) Petal.Width > 1.75 46 9.635 virginica ( 0.00000 0.02174 0.97826 )
14) Petal.Length < 4.95 6 5.407 virginica ( 0.00000 0.16667 0.83333 ) *
15) Petal.Length > 4.95 40 0.000 virginica ( 0.00000 0.00000 1.00000 )
*
More information about the R-help
mailing list