[R] memory problems when combining randomForests [Broadcast]
Eleni Rapsomaniki
e.rapsomaniki at mail.cryst.bbk.ac.uk
Thu Jul 27 17:07:55 CEST 2006
I'm using R (windows) version 2.1.1, randomForest version 4.15.
I call randomForest like this:
my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index],
xtest=test.df[,-response_index], ytest=test.df[,response_index],
importance=TRUE,proximity=FALSE, keep.forest=TRUE)
(where train.df and test.df are my train and test data.frames and
response_index is the column number specifiying the class)
I then save each tree to a file so I can combine them all afterwards. There are
no memory issues when keep.forest=FALSE. But I think that's the bit I need for
future predictions (right?).
I did check previous messages on memory issues, and thought that
combining the trees afterwards would solve the problem. Since my
cross-validation subsets give me a fairly stable error-rate, I suppose I could
just use a randomForest trained on just a subset of my data. But would I not be
"wasting" data this way?
A bit off the subject, but should the order at which at rows (ie. sets of
explanatory variables) are passed to the randomForest function affect the
result? I have noticed that if I pick a random unordered sample from my control
data for training the error rate is much lower than if I a take an ordered
sample. This remains true for all my cross-validation results.
I'm sorry for my many questions.
Many Thanks
Eleni Rapsomaniki
More information about the R-help
mailing list