[R] CART vs. Random Forest

Wiener, Matthew matthew_wiener at merck.com
Thu Sep 26 14:51:18 CEST 2002


If either method were just guessing 0 to reduce the error rate, shouldn't
they achieve a 1/34 ~ 3% or 1/100 = 1% error rate in the last two examples?
And for that matter 20% and 10%  in the first two?  It doesn't look like
that's what's going on.

One suggestion if making sure you find the 1's is more important than having
a low overall error rate:  in rpart, you can specify a loss matrix to say
that certain kinds of errors are more important than others.  In a random
forest, you can use different voting thresholds for "1-ness" and "0-ness" to
bias things -- that is, instead of just taking majority vote, you might
require (for example) 85% of the trees to agree for something to be declared
in class 0.

It's hard to say much more without knowing anything about your data.  But in
my experience random forests have substantially outperformed single trees in
many problems (and I haven't yet encountered one in which a single tree
outperformed a random forest).

Hope this helps,

Matthew Wiener
RY84-202
Applied Computer Science & Mathematics Dept.
Merck Research Labs
126 E. Lincoln Ave.
Rahway, NJ 07065
732-594-5303 

-----Original Message-----
From: Andrew Baek [mailto:andrew at stat.ucla.edu]
Sent: Wednesday, September 25, 2002 3:52 PM
To: r-help at stat.math.ethz.ch
Subject: [R] CART vs. Random Forest


According to Dr. Breiman, the RF should be more accurate
method than a single tree. However, the performance of each 
method seems to depend on the proprotion of outcome variable 
in my case. My data set is a typical classification problem
(predict bad guys). When I ran both of them with different 
proportion of outcome variables(there's a criterion to measure 
the degree of bad behavior), I got very strange results. 

1. proportion of 1 to 0 = 1:4
err.rate of CART = 25.2%
err.rate of RF = 25.6%

2. 1:9 
err.rate of CART = 28.5%
err.rate of RF = 21.2%

3. 1:33
err.rate of CART = 28.2%
err.rate of RF = 12.1%

4. 1:99
err.rate of CART = 25.1%
err.rate of RF = 7.3%


In 3 & 4, RF looks superior to CART. But I'm afraid RF just
vote for "0" to reduce the error rate. Any suggestions? 
Thank you. 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._

------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it.

==============================================================================

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list