[R] Tree question

Peter Flom flom at ndri.org
Tue Jul 15 20:44:29 CEST 2003


I was under the impression that the tree method (e.g. as implemented in
rpart) was insensitive to monotonic transformations of the dependent
variable.  e.g. Breiman Olshen et al. Classification and Regression
Trees  state "In a standard data structure [a tree] is invariant under
all monotone transformations of individual ordered varaibles" (p. 57)

However, I get very different results from
tr.hh.pri <- rpart((log(YPRISX+1)~AGE+DRUGUSEY+SEX+OBSXNUM))

and 

tr.hh.pri <- rpart(YPRISX~AGE+DRUGUSEY+SEX+OBSXNUM)

the former gives more splits and different splits.

Some notes:
The DV is a count variable, and highly skew, with some 0s, many 1s, and
a long right tail out to 99.
AGE ranges from 18-25
DRUGUSEY is ordered (hardest drug used)
and 
OBSXNUM is also ordered (proportion of your friends who object to your
having 'casual sex')

printing the first tree gives

 1) root 307 23.472040 0.7114605  
   2) AGE>=19.5 196 13.811070 0.6857971  
     4) OBSXNUM< 2.5 69  5.712526 0.6338252  
       8) DRUGUSEY>=1.5 15  2.261203 0.5161601 *
       9) DRUGUSEY< 1.5 54  3.185960 0.6665100 *
     5) OBSXNUM>=2.5 127  7.810911 0.7140339 *
   3) AGE< 19.5 111  9.303947 0.7567761  
     6) DRUGUSEY< 0.5 48  1.105266 0.6727132 *
     7) DRUGUSEY>=0.5 63  7.601052 0.8208239  
      14) SEX>=1.5 21  1.258395 0.7317629 *
      15) SEX< 1.5 42  6.092803 0.8653544 *


printing the second tree gives

 1) root 307 144.540700 1.1205210  
   2) AGE>=19.5 196  68.382650 1.0561220 *
   3) AGE< 19.5 111  73.909910 1.2342340  
     6) DRUGUSEY< 0.5 48   2.979167 0.9791667 *
     7) DRUGUSEY>=0.5 63  65.428570 1.4285710  
      14) SEX>=1.5 21   6.571429 1.1428570 *
      15) SEX< 1.5 42  56.285710 1.5714290 *


So, is this the 'exception that proves the rule'? Have I done something
wrong?  Or what?

Any ideas or thoughts?

Thanks in advance


Peter

Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)




More information about the R-help mailing list