[R] randomForest speed improvements
Liaw, Andy
andy_liaw at merck.com
Wed Jan 5 16:20:40 CET 2011
Note that that isn't exactly what I recommended. If you look at the
example in the help page for combine(), you'll see that it is combining
RF objects trained on the same data; i.e., instead of having one RF with
500 trees, you can combine five RFs trained on the same data with 100
trees each into one 500-tree RF.
The way you are using combine() is basically using sample size to limit
tree size, which you can do by playing with the nodesize argument in
randomForest() as I suggested previously. Either way is fine as long as
you don't see prediction performance degrading.
Andy
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of apresley
> Sent: Tuesday, January 04, 2011 6:30 PM
> To: r-help at r-project.org
> Subject: Re: [R] randomForest speed improvements
>
>
> Andy,
>
> Thanks for the reply. I had no idea I could combine them
> back ... that
> actually will work pretty well. We can have several "worker
> threads" load
> up the RF's on different machines and/or cores, and then
> re-assemble them.
> RMPI might be an option down the road, but would be a bit of
> overhead for us
> now.
>
> Using the method of combine() ... I was able to drastically reduce the
> amount of time to build randomForest objects. IE, using
> about 25,000 rows
> (6 columns), it takes maybe 5 minutes on my laptop. Using 5
> randomForest
> objects (each with 5k rows), and then combining them, takes <
> 1 minute.
>
> --
> Anthony
> --
> View this message in context:
> http://r.789695.n4.nabble.com/randomForest-speed-improvements-
> tp3172523p3174621.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-help
mailing list