[R] randomForest speed improvements
Jonathan P Daily
jdaily at usgs.gov
Mon Jan 3 21:10:55 CET 2011
Have you tried adjusting:
mtry - the number of parameters to try per tree
ntree - the number of trees grown
keep.forest - logical on whether to store tree
Specifically, I found huge improvements in speed by switching keep.forest
to FALSE in the past when I didn't actually need the forest post analysis.
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
"Is the room still a room when its empty? Does the room,
the thing itself have purpose? Or do we, what's the word... imbue it."
- Jubal Early, Firefly
r-help-bounces at r-project.org wrote on 01/03/2011 02:59:29 PM:
> [image removed]
> [R] randomForest speed improvements
> 01/03/2011 03:03 PM
> Sent by:
> r-help-bounces at r-project.org
> Hi there,
> We're trying to use randomForest to do some predictions. The
> for our code is pretty straightforward:
> library ('randomForest');
> data202 <- read.csv ("random.csv", header=TRUE);
> x<- data202[1:50000,1:6];
> y<- data202[1:50000,8];
> y<- y[,drop=TRUE];
> x2 <- data202[50001:60000,1:6];
> y2 <- data202[50001:60000,8];
> y2 <- y2[,drop=TRUE];
> RFobject <- randomForest(x,y,na.action=na.roughfix);
> p <- predict (RFobject, x2);
> In this case, the CSV contains 10 columns, of which 1-6 are numeric in
> nature (day of week, week of month, etc...) and column 8 is the target
> (sales, a numeric number).
> randomForest does fine with the data, our issue is how long it takes. In
> this case, about 5,000 rows of data seems to take just a few seconds,
> going to 50,000 rows doesn't take 5x the time, it takes perhaps 30 or 40
> We've downloaded and tried RT-Rank, which is a multi-threaded version of
> RandomForest, and this seems to produce the same (or slightly better)
> predictions, but also gets done fairly quickly.
> What can we do to improve the speed of this data computation? The
> we're on is a dual quad-core Intel CPU @ 2.33Ghz, and with 16GB of RAM
> we're using the "stock" R RPM for CentOS 5.5.
> View this message in context: http://r.789695.n4.nabble.com/
> Sent from the R help mailing list archive at Nabble.com.
> R-help at r-project.org mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help