[R] anyone know why package "RandomForest" na.roughfix is so slow??

Liaw, Andy andy_liaw at merck.com
Thu Jul 1 17:58:58 CEST 2010


You have not shown any code on exactly how you use na.roughfix(), so I
can only guess.

If you are doing something like:

  randomForest(y ~ ., mybigdata, na.action=na.roughfix, ...)

I would not be surprised that it's taking very long on large datasets.
Most likely it's caused by the formula interface, not na.roughfix()
itself.

If that is your case, try doing the imputation beforehand and run
randomForest() afterward; e.g.,

myroughfixed <- na.roughfix(mybigdata)
randomForest(myroughfixed[list.of.predictor.columns],
myroughfixed[[myresponse]],...)

HTH,
Andy

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Mike Williamson
Sent: Wednesday, June 30, 2010 7:53 PM
To: r-help
Subject: [R] anyone know why package "RandomForest" na.roughfix is so
slow??

Hi all,

    I am using the package "random forest" for random forest
predictions.  I
like the package.  However, I have fairly large data sets, and it can
often
take *hours* just to go through the "na.roughfix" call, which simply
goes
through and cleans up any NA values to either the median (numerical
data) or
the most frequent occurrence (factors).
    I am going to start doing some comparisons between na.roughfix() and
some apply() functions which, it seems, are able to do the same job more
quickly.  But I hesitate to duplicate a function that is already in the
package, since I presume the na.roughfix should be as quick as possible
and
it should also be well "tailored" to the requirements of random forest.

    Has anyone else seen that this is really slow?  (I haven't noticed
rfImpute to be nearly as slow, but I cannot say for sure:  my "predict"
data
sets are MUCH larger than my model data sets, so cleaning the prediction
data set simply takes much longer.)
    If so, any ideas how to speed this up?

                              Thanks!
                                   Mike



"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
 -- xkcd

--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list