[R] randomForest speed improvements
Liaw, Andy
andy_liaw at merck.com
Wed Jan 5 19:01:13 CET 2011
From: Liaw, Andy
>
> Note that that isn't exactly what I recommended. If you look at the
> example in the help page for combine(), you'll see that it is
> combining
> RF objects trained on the same data; i.e., instead of having
> one RF with
> 500 trees, you can combine five RFs trained on the same data with 100
> trees each into one 500-tree RF.
>
> The way you are using combine() is basically using sample
> size to limit
> tree size, which you can do by playing with the nodesize argument in
> randomForest() as I suggested previously. Either way is fine
> as long as
> you don't see prediction performance degrading.
I should also mention that another way you can do something similar is
by making use of the sampsize argument in randomForest(). For example,
if you call randomForest() with sampsize=500, it will randomly draw 500
data points to grow each tree. This way you don't even need to run the
RFs separately and combine them.
Andy
> Andy
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of apresley
> > Sent: Tuesday, January 04, 2011 6:30 PM
> > To: r-help at r-project.org
> > Subject: Re: [R] randomForest speed improvements
> >
> >
> > Andy,
> >
> > Thanks for the reply. I had no idea I could combine them
> > back ... that
> > actually will work pretty well. We can have several "worker
> > threads" load
> > up the RF's on different machines and/or cores, and then
> > re-assemble them.
> > RMPI might be an option down the road, but would be a bit of
> > overhead for us
> > now.
> >
> > Using the method of combine() ... I was able to drastically
> reduce the
> > amount of time to build randomForest objects. IE, using
> > about 25,000 rows
> > (6 columns), it takes maybe 5 minutes on my laptop. Using 5
> > randomForest
> > objects (each with 5k rows), and then combining them, takes <
> > 1 minute.
> >
> > --
> > Anthony
> > --
> > View this message in context:
> > http://r.789695.n4.nabble.com/randomForest-speed-improvements-
> > tp3172523p3174621.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> Notice: This e-mail message, together with any
> attachme...{{dropped:11}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-help
mailing list