[R] Question about randomForest

Sun Nov 27 01:44:04 CET 2011

Hi Matthew,

The error rate reported by randomForest is the prediction error based
on out-of-bag OOB data. Therefore, it is different from prediction
error on the original data  since each tree was built using bootstrap
samples (about 70% of the original data), and the error rate of OOB is
likely higher than the prediction error of the original data as you
observed.

Weidong

On Sat, Nov 26, 2011 at 3:02 PM, Matthew Francis
<mattjamesfrancis at gmail.com> wrote:
> I've been using the R package randomForest but there is an aspect I
> cannot work out the meaning of. After calling the randomForest
> function, the returned object contains an element called prediction,
> which is the prediction obtained using all the trees (at least that's
> my understanding). I've checked that this prediction set has the error
> rate as reported by err.rate.
>
> However, if I send the training data back into the the
> predict.randomForest function I find I get a different result to the
> stored set of predictions. This is true for both classification and
> regression. I find the predictions obtained this way also have a much
> lower error rate and perform very well (suspiciously well...) on
> measures such as AUC.
>
> My understanding is that the two predictions above should be the same.
> Since they are not, I must be not understanding something properly.
> Any ideas what's going on?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>