[R] memory problems when combining randomForests
Liaw, Andy
andy_liaw at merck.com
Fri Jul 28 15:28:09 CEST 2006
From: Eleni Rapsomaniki
>
> Hi Andy,
>
> > > I'm using R (windows) version 2.1.1, randomForest version 4.15.
> > ^^^^^^^^^^^^^^^^^^^^^^^^^
> > Never seen such a version...
> Ooops! I meant 4.5-15
>
> > > I then save each tree to a file so I can combine them all
> > > afterwards. There are no memory issues when
> > > keep.forest=FALSE. But I think that's the bit I need for
> > > future predictions (right?).
> >
> > Yes, but what is your question? (Do you mean each *forest*,
> > instead of each *tree*?)
> I mean the component of the object that is created from
> randomForest that has
> the name "forest" (and takes up all the memory!).
Yes, the forest can take up quite a bit of space. You might
consider setting nodesize larger and see if that gives you
sufficient space saving w/o compromising prediction performance.
> > > A bit off the subject, but should the order at which at rows
> > > (ie. sets of explanatory variables) are passed to the
> > > randomForest function affect the result? I have noticed that
> > > if I pick a random unordered sample from my control data for
> > > training the error rate is much lower than if I a take an
> > > ordered sample. This remains true for all my cross-validation
> > > results.
> >
> > I'm not sure I understand. In randomForest() (as in other
> > functions) variables are in columns, rather than rows, so
> > are you talking about variables (columns) in different order
> > or data (rows) in different order?
>
> Yes, sorry I confused you. I mean the order at which data
> (rows) is passed, not
> columns.
Then I'm not sure what you mean by difference in performance, even
in cross-validation. Perhaps you can show some example? Each
tree in the forest is grown on a random sample from the data, so
the order of the row can not matter.
> Finally, I see from
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#inter
>
> that there is a component in Breiman's implementation of
> randomForest that
> computes interactions between parameters. Has this been
> implemented in R yet?
No. Prof. Breiman told me that is very experimental, and he
wouldn't mind if that doesn't make it into the R package.
Since I have other priorities for the package, that naturally
went to the backburner.
Cheers,
Andy
> Many thanks for your time and help.
> Eleni Rapsomaniki
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
More information about the R-help
mailing list