[R] rpart puzzle
Marc Feldesman
feldesmanm at pdx.edu
Thu Jul 12 22:03:23 CEST 2001
Two problems here:
1) rpart is supposed to follow the Breiman et al (1984) monograph, which
looks at all n*v values of potential splitters (n = cases; v= variables)
and then splits on the midpoint using the rule:
x7<= 37
x7 > 37
2) It makes the tree useless for dealing with unknown observations where
x7 may happen to equal 37.
The reason this even came to my notice is because of this precise
circumstance. I found that rpart moved to a surrogate variable when
x7=37. There was no need to use a surrogate since x7 wasn't missing and
assumed a value well within x7's range.
At 11:22 AM 7/12/2001 -0700, White.Denis at epamail.epa.gov wrote:
>I haven't looked that carefully at rpart, but in tree the potential splits
>are midpoints between actual data values. So if x7 had values of 36 and
>38, but not 37, a valid split would be < 37 and > 37.
>
>
>
>
>
>
> Marc
> Feldesman
> <feldesmanm at pdx.edu> To:
> r-help at stat.math.ethz.ch
> Sent
> by: cc: therneau at mayo.edu
> owner-r-help at stat.ma Subject: [R] rpart
> puzzle
> th.ethz.ch
>
>
>
>
>
> 07/12/2001
> 09:02
>
>
>
>
>
>
>
>
>I've been using the package rpart with R 1.3.0 for Windows to produce
>simple classification trees for some measurement data from paleontological
>specimens. Both the rpart documentation and the output confirm that the
>program produces splits on continuous data that leave "holes" in the
>data. It is probably of little practical importance, but is there a reason
>
>why the binary splits are constructed in the form (e.g):
>
>x7 < 37
>x7 > 37
>
>as opposed to the actual CART (tm) methodology of:
>
>x7 <= 37
>x7 > 37
>
>It seems to me that if one were to use rpart to classify an unknown case
>where x7 = 37, the program wouldn't actually know which way to move the
>case.
>
>I've read through the rpart technical report, the rpart user's manual, the
>rpart help file and see this practice illustrated, but don't find any
>explanation for this minor (and probably trivial) departure from the
>methodology illustrated in the CART program and in the Breiman et al book.
>
>
>
>
>
>
>=====================
>Dr. Marc R. Feldesman
>Professor and Chairman
>Anthropology Department
>Portland State University
>1721 SW Broadway
>Portland, Oregon 97201
>email: feldesmanm at pdx.edu
>phone: 503-725-3081
>fax: 503-725-3905
>http://web.pdx.edu/~h1mf
>PGP Key Available On Request
>======================
>
>"Anyway, no drug, not even alcohol, causes the fundamental ills of society.
>If we're looking for the source of our troubles, we shouldn't test people
>for drugs, we should test them for stupidity, ignorance, greed and love of
>power." P.J. O'Rourke
>
>Powered by Optiplochoerus and Windows 2000 (scary isn't it?)
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
>-.-.-
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
>_._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list