[R] Random Forest:how to do an automatic rerun using only the important variables

Liaw, Andy andy_liaw at merck.com
Mon Apr 12 22:33:34 CEST 2004


That's the advantage of having an R interface to RF: you can do such
automation rather easily.  E.g.,

twoStageRF <- function(x, y, nVar=round(0.1*ncol(x)), ...) {
  imp <- randomForest(x, y, importance=TRUE, ...)$importance[,3]
  cutoff <- sort(imp, decreasing=TRUE)[nVar]
  randomForest(x[, imp >= cutoff], y, ...)
}

[Disclaimer: I just wrote the function on the spot, so completely untested.
This is just to demonstrate how simple it would be.  You can embelish it as
much as you'd like.]

I have written a function that uses CV to choose the `optimal' number of
variables to keep (rather than blindly select one up front).  I might toss
it in the next version of the package...

HTH,
Andy

> From: Hui Han
> 
> Hi,
> 
> I am using the Random Forest Package, and want to do an 
> automatic rerun
> using only those variables that were most important in the 
> original run.
> Is there anybody who has experience with this and can give me helpful 
> suggestions?
> 
> Best regards,
> 
> Hui Han
> Department of Computer Science and Engineering,
> The Pennsylvania State University 
> University Park, PA,16802
> email: hhan at cse.psu.edu
> homepage: http://www.cse.psu.edu/~hhan
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list