[R] rpart puzzle

Marc Feldesman feldesmanm at pdx.edu
Thu Jul 12 22:03:23 CEST 2001


Two problems here:

1)  rpart is supposed to follow the Breiman et al (1984) monograph, which 
looks at all n*v values of potential splitters (n = cases; v= variables) 
and then splits on the midpoint using the rule:

x7<= 37
x7 > 37

2)  It makes the tree useless for dealing with unknown observations where 
x7 may happen to equal 37.

The reason this even came to my notice is because of this precise 
circumstance.  I found that rpart moved to a surrogate variable when 
x7=37.  There was no need to use a surrogate since x7 wasn't missing and 
assumed a value well within x7's range.



At 11:22 AM 7/12/2001 -0700, White.Denis at epamail.epa.gov wrote:

>I haven't looked that carefully at rpart, but in tree the potential splits
>are midpoints between actual data values.  So if x7 had values of 36 and
>38, but not 37, a valid split would be < 37 and > 37.
>
>
>
>
> 
>
>                     Marc 
> Feldesman
>                     <feldesmanm at pdx.edu>        To: 
> r-help at stat.math.ethz.ch
>                     Sent 
> by:                    cc:     therneau at mayo.edu
>                     owner-r-help at stat.ma        Subject:     [R] rpart 
> puzzle
>                     th.ethz.ch 
>
> 
>
> 
>
>                     07/12/2001 
> 09:02
> 
>
> 
>
>
>
>
>
>I've been using the package rpart with R 1.3.0 for Windows to produce
>simple classification trees for some measurement data from paleontological
>specimens.  Both the rpart documentation and the output confirm that the
>program produces splits on continuous data that leave "holes" in the
>data.  It is probably of little practical importance, but is there a reason
>
>why the binary splits are constructed in the form (e.g):
>
>x7 < 37
>x7 > 37
>
>as opposed to the actual CART (tm) methodology of:
>
>x7 <= 37
>x7 > 37
>
>It seems to me that if one were to use rpart to classify an unknown case
>where x7 = 37, the program wouldn't actually know which way to move the
>case.
>
>I've read through the rpart technical report, the rpart user's manual, the
>rpart help file and see this practice illustrated, but don't find any
>explanation for this minor (and probably trivial) departure from the
>methodology illustrated in the CART program and in the Breiman et al book.
>
>
>
>
>
>
>=====================
>Dr. Marc R. Feldesman
>Professor and Chairman
>Anthropology Department
>Portland State University
>1721 SW Broadway
>Portland, Oregon 97201
>email:  feldesmanm at pdx.edu
>phone:  503-725-3081
>fax:    503-725-3905
>http://web.pdx.edu/~h1mf
>PGP Key Available On Request
>======================
>
>"Anyway, no drug, not even alcohol, causes the fundamental ills of society.
>If we're looking for the source of our troubles, we shouldn't test people
>for drugs, we should test them for stupidity, ignorance, greed and love of
>power."   P.J. O'Rourke
>
>Powered by Optiplochoerus and Windows 2000 (scary isn't it?)
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
>-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
>_._._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list