[R] memory problems when combining randomForests

Liaw, Andy andy_liaw at merck.com
Fri Jul 28 15:28:09 CEST 2006

From: Eleni Rapsomaniki
> Hi Andy, 
> > > I'm using R (windows) version 2.1.1, randomForest version 4.15. 
> >                                        ^^^^^^^^^^^^^^^^^^^^^^^^^ 
> > Never seen such a version...
> Ooops! I meant 4.5-15
> > > I then save each tree to a file so I can combine them all 
> > > afterwards. There are no memory issues when 
> > > keep.forest=FALSE. But I think that's the bit I need for 
> > > future predictions (right?). 
> > 
> > Yes, but what is your question?  (Do you mean each *forest*,
> > instead of each *tree*?)
> I mean the component of the object that is created from 
> randomForest that has
> the name "forest" (and takes up all the memory!). 

Yes, the forest can take up quite a bit of space.  You might 
consider setting nodesize larger and see if that gives you 
sufficient space saving w/o compromising prediction performance.
> > > A bit off the subject, but should the order at which at rows 
> > > (ie. sets of explanatory variables) are passed to the 
> > > randomForest function affect the result? I have noticed that 
> > > if I pick a random unordered sample from my control data for 
> > > training the error rate is much lower than if I a take an 
> > > ordered sample. This remains true for all my cross-validation 
> > > results. 
> > 
> > I'm not sure I understand.  In randomForest() (as in other
> > functions) variables are in columns, rather than rows, so
> > are you talking about variables (columns) in different order 
> > or data (rows) in different order?
> Yes, sorry I confused you. I mean the order at which data 
> (rows) is passed, not
> columns.

Then I'm not sure what you mean by difference in performance, even
in cross-validation.  Perhaps you can show some example?  Each 
tree in the forest is grown on a random sample from the data, so
the order of the row can not matter.

> Finally, I see from
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter
> that there is a component in Breiman's implementation of 
> randomForest that
> computes interactions between parameters. Has this been 
> implemented in R yet?

No.  Prof. Breiman told me that is very experimental, and he
wouldn't mind if that doesn't make it into the R package.  
Since I have other priorities for the package, that naturally
went to the backburner.


> Many thanks for your time and help.
> Eleni Rapsomaniki
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.

More information about the R-help mailing list