[R] memory problems when combining randomForests
andy_liaw at merck.com
Thu Jul 27 20:59:30 CEST 2006
From: Eleni Rapsomaniki
> I'm using R (windows) version 2.1.1, randomForest version 4.15.
Never seen such a version...
> I call randomForest like this:
> importance=TRUE,proximity=FALSE, keep.forest=TRUE)
> (where train.df and test.df are my train and test
> data.frames and response_index is the column number
> specifiying the class)
> I then save each tree to a file so I can combine them all
> afterwards. There are no memory issues when
> keep.forest=FALSE. But I think that's the bit I need for
> future predictions (right?).
Yes, but what is your question? (Do you mean each *forest*,
instead of each *tree*?)
> I did check previous messages on memory issues, and thought
> that combining the trees afterwards would solve the problem.
> Since my cross-validation subsets give me a fairly stable
> error-rate, I suppose I could just use a randomForest trained
> on just a subset of my data. But would I not be "wasting"
> data this way?
Perhaps, but see Jerry Friedman's ISLE, where he argued
that RF with very small trees grown on small random samples
can give even better results some of the times.
> A bit off the subject, but should the order at which at rows
> (ie. sets of explanatory variables) are passed to the
> randomForest function affect the result? I have noticed that
> if I pick a random unordered sample from my control data for
> training the error rate is much lower than if I a take an
> ordered sample. This remains true for all my cross-validation
I'm not sure I understand. In randomForest() (as in other
functions) variables are in columns, rather than rows, so
are you talking about variables (columns) in different order
or data (rows) in different order?
> I'm sorry for my many questions.
> Many Thanks
> Eleni Rapsomaniki
> This message was sent using IMP, the Internet Messaging Program.
More information about the R-help