[R] how to use the rpart function?

Liaw, Andy andy_liaw at merck.com
Sun Mar 12 04:54:11 CET 2006


I believe it really would be more productive to understand what the
parameters do than to tune them blindly by brute-force.  Whether tuning any
of the parameters would have impact on the error rate (I assume you're
referring to some estimate of the test set error rate, as there's not much
point in looking at training error rate) can also depend on the nature of
your data.  

I believe the rpart package comes with a pdf file of a tech report by its
original authors.  It's worth reading.

Andy

From: Michael
> 
> I've spent many hours on these parameters.
> 
> I changed them one by one and exhaustively all the possible 
> combinations.
> 
> To my surprise, only "cp" will affect the performance of the 
> classifier.
> 
> Others e.g. "maxsplit", etc. does not affect error rate at all.
> 
> I felt cheated by rpart.control().
> 
> 
> On 3/11/06, Michael <comtech.usa at gmail.com> wrote:
> >
> >  Yes, rpart.control() has a bunch of parameters...
> > I don't know which one can mostly improve the classification 
> > performance.
> >
> >
> >  On 3/9/06, Carlos Ortega <coforfe at gmail.com> wrote:
> > >
> > >  Hello,
> > >
> > > Yes, check rpart.control() for details.
> > >
> > > Regards,
> > >  Carlos.
> > >
> > >
> > >  On 3/9/06, Michael <comtech.usa at gmail.com > wrote:
> > > >
> > > > I see! So you mean I have to collect error counts myself 
> > > > manually...
> > > >
> > > > By the way, what parameters do I normally change to improve the 
> > > > default rpart performance?
> > > >
> > > > Thanks a lot!
> > > >
> > > >
> > > > On 3/8/06, Carlos Ortega <coforfe at gmail.com> wrote:
> > > > >
> > > > > Hello Michael,
> > > > >
> > > > > In some of the examples in the rpart function you 
> will find that 
> > > > > comparison between the actual and the predicted 
> values, although 
> > > > > it is for the "Classification" mode, not for the regression.
> > > > >
> > > > > Regards,
> > > > > Carlos.
> > > > >
> > > > >
> > > > >  On 3/7/06, Michael < comtech.usa at gmail.com > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > >
> > > > > >
> > > > > > What parameter do I normally change in the rpart 
> function? How 
> > > > > > do I set the "cp" option?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Is there a way to read off error rate directly from the 
> > > > > > "rpart" function for training data; I imagine for 
> testing data 
> > > > > > I have to apply a "predict", but
> > > > > > for training data I guess the error count would be somewhere
> > > > > > existing once
> > > > > > the "rpart" function is finished. Looks like it is 
> related to
> > > > > > expressions
> > > > > > such as "expected loss=0.8362365" when using 
> "summary" function.
> > > > > >
> > > > > > Now I have to do this manually, and when it came to compare 
> > > > > > the correct vs. wrong and count the errors, it was 
> always very 
> > > > > > tedious...
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks a lot!
> > > > > >
> > > > > >
> > > > > >
> > > > > > M.
> > > > > >
> > > > > >        [[alternative HTML version deleted]]
> > > > > >
> > > > > > ______________________________________________
> > > > > > R-help at stat.math.ethz.ch mailing list 
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > PLEASE do read the posting guide! 
> > > > > > 
> http://www.R-project.org/posting-guide.html<http://www.r-proje
> > > > > > ct.org/posting-guide.html>
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list