[R] Questions on RWeka classifiers?

Hans W. Borchers hwborchers at gmail.com
Wed Nov 28 13:12:46 CET 2007


Li Li <lilycai2007 <at> gmail.com> writes:

> 
> Hi,
> 
> I am using some classifiers in RWeka packages and met a couple problems.
> 
> (1) J48 implements C45 classifier, the C45 should be able to handle missing
>      values in both training set and test set. But I found the J48
> classifier can
>      not be evaluated on test set with missing values--it just ignore them.

Why don't you ask this question on the WEKA mailing list at, for instance,
http://news.gmane.org/gmane.comp.ai.weka !

If I remember correctly, C4.5 is smart enough to simply drop examples with
missing values, while C5.0 will handle them more intelligently. It will also
address numerical attributes more sensible than C4.5 or CART.

Unfortunately, C5.0 is commercial software, but you can get a 2-weeks demo from
Quinlan's site.

> (2) The ensemble classifiers in RWeka such as bagging and boosting: there
>      is a control argument as "W" to describe which base classifier should
> be used.
>      I use "W=J48" to boost C45 tree, but I am not sure how to down size the
> tree
>     to be a "weak" learner. Based on what I observed, the default boosted
> J48 tree
>     gets worse performance.

This is difficult to answer without any concrete data. From my own experince I
can say that in many cases results have distinctively improved when applying
Adaboost.

> 
> Thanks for any discussion and help,
> 
> Li
> 

Regards,  Hans Werner Borchers



More information about the R-help mailing list