[R] na.action in randomForest --- Summary
David Parkhurst
parkhurs at ariel.ucs.indiana.edu
Tue Aug 5 21:31:03 CEST 2003
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman’s randomForest; the function’s
help page did not say anything about other options.
I have since discovered that a pdf document called “The randomForest
Package” and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of Breiman’s
suggestion “to replace each missing value by the median of its column and
each missing categorical by the most frequent value in that categorical. My
impression is that because of the randomness and the many trees grown,
filling in missing values with a sensible values does not effect accuracy
much.” (from his report, "Manual On Setting Up, Using, And Understanding
Random Forests V3.1").
I now plan to try the na.roughfix option from Liaw’s package.
Thanks to Uwe Ligges and Brian Ripley for their replies to my posting.
Dave Parkhurst
More information about the R-help
mailing list