[R] rpart puzzle

Marc Feldesman feldesmanm at pdx.edu
Thu Jul 12 23:04:42 CEST 2001


I amend my previous observation.  After constructing a very careful 
example, rpart works exactly the opposite of CART.  In the following split:

x7 < 37  go left
x7 > 37  go right

if x7=37 the case appears to go right.  In other words, the split appears 
to be of the form:

x7 < 37
x7 >= 37,

which is precisely the opposite form that CART(tm) uses.

Again, I'm not sure what practical difference this makes except that when a 
case has a primary splitter that is in an (apparently) excluded part of the 
domain, the case goes with the "no" answer to the question.  (This is, of 
course, obvious if typical 'short-circuit' evaluation is used - because the 
value fails the first test (x7 <37) it must obviously go with the 
alternative.  In CART, the case goes with the "yes" answer.  Don't know 
what tree does since I don't use it.

In my test example, rpart's behavior results in a misclassification.  Had 
the test result gone the other way the case gets classified 
correctly.  Walking the tree demonstrates this quite easily.  Also, 
changing the value of 37 to 36.9999 produces the correct 
classification.  (Now I *do* realize that I'm working with floating point 
numbers and so "real" 37 may not truly equal "integer" 37, which may 
account for *this* anomaly).

Did I have the misfortune to pull an "unknown" with a major primary 
splitter occupying an ambiguous part of the domain, or is this a more 
significant problem?




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list