[R] randomForest speed improvements

Liaw, Andy andy_liaw at merck.com
Wed Jan 5 19:01:13 CET 2011


From: Liaw, Andy
> 
> Note that that isn't exactly what I recommended.  If you look at the
> example in the help page for combine(), you'll see that it is 
> combining
> RF objects trained on the same data; i.e., instead of having 
> one RF with
> 500 trees, you can combine five RFs trained on the same data with 100
> trees each into one 500-tree RF.
> 
> The way you are using combine() is basically using sample 
> size to limit
> tree size, which you can do by playing with the nodesize argument in
> randomForest() as I suggested previously.  Either way is fine 
> as long as
> you don't see prediction performance degrading.

I should also mention that another way you can do something similar is
by making use of the sampsize argument in randomForest().  For example,
if you call randomForest() with sampsize=500, it will randomly draw 500
data points to grow each tree.  This way you don't even need to run the
RFs separately and combine them.  

Andy


> Andy
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org 
> > [mailto:r-help-bounces at r-project.org] On Behalf Of apresley
> > Sent: Tuesday, January 04, 2011 6:30 PM
> > To: r-help at r-project.org
> > Subject: Re: [R] randomForest speed improvements
> > 
> > 
> > Andy,
> > 
> > Thanks for the reply.  I had no idea I could combine them 
> > back ... that
> > actually will work pretty well.  We can have several "worker 
> > threads" load
> > up the RF's on different machines and/or cores, and then 
> > re-assemble them. 
> > RMPI might be an option down the road, but would be a bit of 
> > overhead for us
> > now.
> > 
> > Using the method of combine() ... I was able to drastically 
> reduce the
> > amount of time to build randomForest objects.  IE, using 
> > about 25,000 rows
> > (6 columns), it takes maybe 5 minutes on my laptop.  Using 5 
> > randomForest
> > objects (each with 5k rows), and then combining them, takes < 
> > 1 minute.
> > 
> > --
> > Anthony
> > -- 
> > View this message in context: 
> > http://r.789695.n4.nabble.com/randomForest-speed-improvements-
> > tp3172523p3174621.html
> > Sent from the R help mailing list archive at Nabble.com.
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> > 
> Notice:  This e-mail message, together with any 
> attachme...{{dropped:11}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list