[R] Discretization of numeric attributes

Hans W. Borchers borchers at decrc.abb.de
Tue May 7 10:20:25 CEST 2002


Thanks for the time you are taking for this.

>Think about what the top level split in a tree does.  You could also
>extract the C routines used.

That's what I didn't want to do. Mayby it's worth to write and to implement 
these routines myself in S.

>You are misusing the terms though: C4.5 is not a splitting rule but a
>tree-construction and pruning algorithm, and MDL is a principle to choose
>complexity.

If you have a look into the the well-known overview article "Supervised and 
unsupervised discretization of continuous features" by Dougherty, Kohavi 
and Sahani, you will see that people have used the approach in C4.5 to 
extract and evaluate the discretization procedure there. That's what I 
meant with "C45".

The MDL appraoch for discretization as used by Fayyad and Irani or 
Kononenko (see the RELIEFF algorithm) you can find realized in the public 
domain WEKA data mining toolkit. Something similar is what I need.

Very best, Hans Werner Borchers.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list