[R] Tree question
Peter Flom
flom at ndri.org
Tue Jul 15 20:44:29 CEST 2003
I was under the impression that the tree method (e.g. as implemented in
rpart) was insensitive to monotonic transformations of the dependent
variable. e.g. Breiman Olshen et al. Classification and Regression
Trees state "In a standard data structure [a tree] is invariant under
all monotone transformations of individual ordered varaibles" (p. 57)
However, I get very different results from
tr.hh.pri <- rpart((log(YPRISX+1)~AGE+DRUGUSEY+SEX+OBSXNUM))
and
tr.hh.pri <- rpart(YPRISX~AGE+DRUGUSEY+SEX+OBSXNUM)
the former gives more splits and different splits.
Some notes:
The DV is a count variable, and highly skew, with some 0s, many 1s, and
a long right tail out to 99.
AGE ranges from 18-25
DRUGUSEY is ordered (hardest drug used)
and
OBSXNUM is also ordered (proportion of your friends who object to your
having 'casual sex')
printing the first tree gives
1) root 307 23.472040 0.7114605
2) AGE>=19.5 196 13.811070 0.6857971
4) OBSXNUM< 2.5 69 5.712526 0.6338252
8) DRUGUSEY>=1.5 15 2.261203 0.5161601 *
9) DRUGUSEY< 1.5 54 3.185960 0.6665100 *
5) OBSXNUM>=2.5 127 7.810911 0.7140339 *
3) AGE< 19.5 111 9.303947 0.7567761
6) DRUGUSEY< 0.5 48 1.105266 0.6727132 *
7) DRUGUSEY>=0.5 63 7.601052 0.8208239
14) SEX>=1.5 21 1.258395 0.7317629 *
15) SEX< 1.5 42 6.092803 0.8653544 *
printing the second tree gives
1) root 307 144.540700 1.1205210
2) AGE>=19.5 196 68.382650 1.0561220 *
3) AGE< 19.5 111 73.909910 1.2342340
6) DRUGUSEY< 0.5 48 2.979167 0.9791667 *
7) DRUGUSEY>=0.5 63 65.428570 1.4285710
14) SEX>=1.5 21 6.571429 1.1428570 *
15) SEX< 1.5 42 56.285710 1.5714290 *
So, is this the 'exception that proves the rule'? Have I done something
wrong? Or what?
Any ideas or thoughts?
Thanks in advance
Peter
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)
More information about the R-help
mailing list