[R] randomForest speed improvements
Jonathan P Daily
jdaily at usgs.gov
Mon Jan 3 21:10:55 CET 2011
Have you tried adjusting:
mtry - the number of parameters to try per tree
ntree - the number of trees grown
keep.forest - logical on whether to store tree
Specifically, I found huge improvements in speed by switching keep.forest
to FALSE in the past when I didn't actually need the forest post analysis.
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
the thing itself have purpose? Or do we, what's the word... imbue it."
- Jubal Early, Firefly
r-help-bounces at r-project.org wrote on 01/03/2011 02:59:29 PM:
> [image removed]
>
> [R] randomForest speed improvements
>
> apresley
>
> to:
>
> r-help
>
> 01/03/2011 03:03 PM
>
> Sent by:
>
> r-help-bounces at r-project.org
>
>
> Hi there,
>
> We're trying to use randomForest to do some predictions. The
test-harness
> for our code is pretty straightforward:
>
> library ('randomForest');
> data202 <- read.csv ("random.csv", header=TRUE);
> x<- data202[1:50000,1:6];
> y<- data202[1:50000,8];
> y<- y[,drop=TRUE];
>
> x2 <- data202[50001:60000,1:6];
> y2 <- data202[50001:60000,8];
> y2 <- y2[,drop=TRUE];
>
> RFobject <- randomForest(x,y,na.action=na.roughfix);
> p <- predict (RFobject, x2);
>
> In this case, the CSV contains 10 columns, of which 1-6 are numeric in
> nature (day of week, week of month, etc...) and column 8 is the target
> (sales, a numeric number).
>
> randomForest does fine with the data, our issue is how long it takes. In
> this case, about 5,000 rows of data seems to take just a few seconds,
but
> going to 50,000 rows doesn't take 5x the time, it takes perhaps 30 or 40
> minutes.
>
> We've downloaded and tried RT-Rank, which is a multi-threaded version of
> RandomForest, and this seems to produce the same (or slightly better)
> predictions, but also gets done fairly quickly.
>
> What can we do to improve the speed of this data computation? The
system
> we're on is a dual quad-core Intel CPU @ 2.33Ghz, and with 16GB of RAM
...
> we're using the "stock" R RPM for CentOS 5.5.
>
> Thanks!
>
> --
> Anthony
> --
> View this message in context: http://r.789695.n4.nabble.com/
> randomForest-speed-improvements-tp3172523p3172523.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list