# [R] Tree question

Peter Flom flom at ndri.org
Tue Jul 15 20:44:29 CEST 2003

```I was under the impression that the tree method (e.g. as implemented in
rpart) was insensitive to monotonic transformations of the dependent
variable.  e.g. Breiman Olshen et al. Classification and Regression
Trees  state "In a standard data structure [a tree] is invariant under
all monotone transformations of individual ordered varaibles" (p. 57)

However, I get very different results from
tr.hh.pri <- rpart((log(YPRISX+1)~AGE+DRUGUSEY+SEX+OBSXNUM))

and

tr.hh.pri <- rpart(YPRISX~AGE+DRUGUSEY+SEX+OBSXNUM)

the former gives more splits and different splits.

Some notes:
The DV is a count variable, and highly skew, with some 0s, many 1s, and
a long right tail out to 99.
AGE ranges from 18-25
DRUGUSEY is ordered (hardest drug used)
and
OBSXNUM is also ordered (proportion of your friends who object to your
having 'casual sex')

printing the first tree gives

1) root 307 23.472040 0.7114605
2) AGE>=19.5 196 13.811070 0.6857971
4) OBSXNUM< 2.5 69  5.712526 0.6338252
8) DRUGUSEY>=1.5 15  2.261203 0.5161601 *
9) DRUGUSEY< 1.5 54  3.185960 0.6665100 *
5) OBSXNUM>=2.5 127  7.810911 0.7140339 *
3) AGE< 19.5 111  9.303947 0.7567761
6) DRUGUSEY< 0.5 48  1.105266 0.6727132 *
7) DRUGUSEY>=0.5 63  7.601052 0.8208239
14) SEX>=1.5 21  1.258395 0.7317629 *
15) SEX< 1.5 42  6.092803 0.8653544 *

printing the second tree gives

1) root 307 144.540700 1.1205210
2) AGE>=19.5 196  68.382650 1.0561220 *
3) AGE< 19.5 111  73.909910 1.2342340
6) DRUGUSEY< 0.5 48   2.979167 0.9791667 *
7) DRUGUSEY>=0.5 63  65.428570 1.4285710
14) SEX>=1.5 21   6.571429 1.1428570 *
15) SEX< 1.5 42  56.285710 1.5714290 *

So, is this the 'exception that proves the rule'? Have I done something
wrong?  Or what?

Any ideas or thoughts?

Peter

