[R] randomForest speed improvements

Liaw, Andy andy_liaw at merck.com
Tue Jan 4 17:24:03 CET 2011

If you have multiple cores, one "poor man's solution" is to run separate
forests in different R sessions, save the RF objects, load them into the
same session and combine() them.  You can do this less clumsily if you
use things like Rmpi or other distributed computing packages.

Another consideration is to increase nodesize (which reduces the sizes
of trees).  The problem with numeric predictors for tree-based
algorithms is that the number of computations to find the best splitting
point increases by that much _at each node_.  Some algorithms try to
save on this by using only certain quantiles.  The current RF code
doesn't do this.



> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of apresley
> Sent: Monday, January 03, 2011 6:28 PM
> To: r-help at r-project.org
> Subject: Re: [R] randomForest speed improvements
> I haven't tried changing the mtry or ntree at all ... though 
> I suppose with
> only 6 variables, and tens-of-thousands of rows, we can 
> probably do less
> than 500 tree's (the default?).
> Although tossing the forest does speed things up a bit, seems 
> to be about 15
> - 20% faster in some cases, I need to keep the forest to do 
> the prediction,
> otherwise, it complains that there is no forest component in 
> the object.
> --
> Anthony
> -- 
> View this message in context: 
> http://r.789695.n4.nabble.com/randomForest-speed-improvements-
> tp3172523p3172834.html
> Sent from the R help mailing list archive at Nabble.com.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}

More information about the R-help mailing list