[R] Using unbalanced-learning algorithms in the randomForest

Byron Dom byron_dom at yahoo.com
Sat May 17 01:18:18 CEST 2014


Responding to my own post/question here.

Andy Liaw directed me to this page: http://grokbase.com/t/r/r-help/05av0aaa2e/r-repost-examples-of-classwt-strata-and-sampsize-i-n-randomforest, which gives an answer to my question. 

----------------------------------- original post ---------------------------------------------------
Date: Tue, 6 May 2014 22:54:22 -0700 (PDT)
From: Byron Dom <byron_dom at yahoo.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Subject: [R] Using unbalanced-learning algorithms in the randomForest
    package.
Message-ID:
    <1399442062.12706.YahooMailNeo at web142801.mail.bf1.yahoo.com>
Content-Type: text/plain
In archive: https://stat.ethz.ch/pipermail/r-help/2014-May/374384.html

The following report by the authors of the randomForest package describes two different algorithm modifications for using random forests to learn classifiers for "unbalanced" learning problems in which one class is much less frequent than the other (in 2-class problems). These two variations are called "balanced RF" and "weighted RF."
http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf


Would someone please answer these three questions.
(1) Is it possible to use the R randomForest package to learn random forests using either of these modified RF-learning algorithms? 
(2) If it is possible, how does one do it?
(3) Is there some detailed documentation for running these modified versions? I've read the R package manual but it's too sketchy. It seems to be primarily for users who are already familiar with the package and just need to look up some detail like the name of an argument.



More information about the R-help mailing list