[R] CART vs. Random Forest
Andrew Baek
andrew at stat.ucla.edu
Thu Sep 26 20:31:54 CEST 2002
Well, I implemented "priors" in both packages. Note that
"cost matrix" in "rpart" has some bugs. You should use
the transpose of the "real" cost matrix. Well, for my
problem, RF doesn't look superior to "rpart". I guess
it depends on the set. Thank you.
Andrew
On Thu, 26 Sep 2002, Wiener, Matthew wrote:
> If either method were just guessing 0 to reduce the error rate, shouldn't
> they achieve a 1/34 ~ 3% or 1/100 = 1% error rate in the last two examples?
> And for that matter 20% and 10% in the first two? It doesn't look like
> that's what's going on.
>
> One suggestion if making sure you find the 1's is more important than having
> a low overall error rate: in rpart, you can specify a loss matrix to say
> that certain kinds of errors are more important than others. In a random
> forest, you can use different voting thresholds for "1-ness" and "0-ness" to
> bias things -- that is, instead of just taking majority vote, you might
> require (for example) 85% of the trees to agree for something to be declared
> in class 0.
>
> It's hard to say much more without knowing anything about your data. But in
> my experience random forests have substantially outperformed single trees in
> many problems (and I haven't yet encountered one in which a single tree
> outperformed a random forest).
>
> Hope this helps,
>
> Matthew Wiener
> RY84-202
> Applied Computer Science & Mathematics Dept.
> Merck Research Labs
> 126 E. Lincoln Ave.
> Rahway, NJ 07065
> 732-594-5303
>
> -----Original Message-----
> From: Andrew Baek [mailto:andrew at stat.ucla.edu]
> Sent: Wednesday, September 25, 2002 3:52 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] CART vs. Random Forest
>
>
> According to Dr. Breiman, the RF should be more accurate
> method than a single tree. However, the performance of each
> method seems to depend on the proprotion of outcome variable
> in my case. My data set is a typical classification problem
> (predict bad guys). When I ran both of them with different
> proportion of outcome variables(there's a criterion to measure
> the degree of bad behavior), I got very strange results.
>
> 1. proportion of 1 to 0 = 1:4
> err.rate of CART = 25.2%
> err.rate of RF = 25.6%
>
> 2. 1:9
> err.rate of CART = 28.5%
> err.rate of RF = 21.2%
>
> 3. 1:33
> err.rate of CART = 28.2%
> err.rate of RF = 12.1%
>
> 4. 1:99
> err.rate of CART = 25.1%
> err.rate of RF = 7.3%
>
>
> In 3 & 4, RF looks superior to CART. But I'm afraid RF just
> vote for "0" to reduce the error rate. Any suggestions?
> Thank you.
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._
>
> ------------------------------------------------------------------------------
> Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.
>
> ==============================================================================
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list