[R] class weights with Random Forest

Liaw, Andy andy_liaw at merck.com
Tue Sep 13 20:29:45 CEST 2011

The current "classwt" option in the randomForest package has been there since the beginning, and is different from how the official Fortran code (version 4 and later) implements class weights.  It simply account for the class weights in the Gini index calculation when splitting nodes, exactly as how a single CART tree is done when given class weights.  Prof. Breiman came up with the newer class weighting scheme implemented in the newer version of his Fortran code after we found that simply using the weights in the Gini index didn't seem to help much in extremely unbalanced data (say 1:100 or worse).  If using weighted Gini helps in your situation, by all means do it.  I can only say that in the past it didn't give us the result we were expecting.


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of James Long
> Sent: Tuesday, September 13, 2011 2:10 AM
> To: r-help at r-project.org
> Subject: [R] class weights with Random Forest
> Hi All,
> I am looking for a reference that explains how the 
> randomForest function in
> the randomForest package uses the classwt parameter. Here:
> http://tolstoy.newcastle.edu.au/R/e4/help/08/05/12088.html
> Andy Liaw suggests not using classwt. And according to:
> http://r.789695.n4.nabble.com/R-help-with-RandomForest-classwt
> -option-td817149.html
> it has "not been implemented" as of 2007. However it improved 
> classification
> performance for a problem I am working on, more than 
> adjusting the sampsize
> parameter. So I'm wondering if it has been implemented 
> recently (since 2007)
> or if there is a detailed explanation of what this 
> unimplemented version is
> doing.
> Thanks!
> James
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

More information about the R-help mailing list