[R] Random Forest:how to do an automatic rerun using only the important variables
Liaw, Andy
andy_liaw at merck.com
Mon Apr 12 22:33:34 CEST 2004
That's the advantage of having an R interface to RF: you can do such
automation rather easily. E.g.,
twoStageRF <- function(x, y, nVar=round(0.1*ncol(x)), ...) {
imp <- randomForest(x, y, importance=TRUE, ...)$importance[,3]
cutoff <- sort(imp, decreasing=TRUE)[nVar]
randomForest(x[, imp >= cutoff], y, ...)
}
[Disclaimer: I just wrote the function on the spot, so completely untested.
This is just to demonstrate how simple it would be. You can embelish it as
much as you'd like.]
I have written a function that uses CV to choose the `optimal' number of
variables to keep (rather than blindly select one up front). I might toss
it in the next version of the package...
HTH,
Andy
> From: Hui Han
>
> Hi,
>
> I am using the Random Forest Package, and want to do an
> automatic rerun
> using only those variables that were most important in the
> original run.
> Is there anybody who has experience with this and can give me helpful
> suggestions?
>
> Best regards,
>
> Hui Han
> Department of Computer Science and Engineering,
> The Pennsylvania State University
> University Park, PA,16802
> email: hhan at cse.psu.edu
> homepage: http://www.cse.psu.edu/~hhan
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list