[R] Interpretation of randomForest results
Liaw, Andy
andy_liaw at merck.com
Tue Jan 18 14:12:23 CET 2005
> From: luk
>
> I got the following results when I run radomForest with below
> commands:
>
> qair <- read.table("train10.dat", header = T)
> oz.rf <- randomForest(LESION ~ ., data = qair, ntree = 220,
> importance = TRUE)
> print(oz.rf)
>
> Call:
> randomForest.formula(x = LESION ~ ., data = qair, ntree =
> 220, importance = TRUE)
> Type of random forest: classification
> Number of trees: 220
> No. of variables tried at each split: 2
> OOB estimate of error rate: 15.86%
^^^
Note what that says, which applies to the confusion matrix below as well.
> Confusion matrix:
> lesion noninf class.error
> lesion 3949 525 0.1173447
> noninf 894 3580 0.1998212
>
> What did this mean? Is 11.7% the classification error for
> 'lesion' class, and 19.98% the classification error for
> 'noninf' class in the training set?
The results you showed above are out-of-bag (OOB) results. If you don't
know what that means, you should read the documentation, and perhaps the
references.
> But when I run below command to test the performance of
> classification in the same training set.
>
> ntrain <- read.table("train10.dat", header = T)
> ntrain.pred <- predict(oz.rf, ntrain)
> table(observed = ntrain[, "LESION"], predicted = ntrain.pred)
>
> I got the following results. It seemed that the
> classification rates for 'lesion' and 'noninf' classes are 0.
> Any suggestion will be very appreciated.
randomForest is rather good at overfitting _training_ data, but that's
(usually) not a problem in classification. What one usually cares about is
the _test set_ performance. There, randomForest performance does not
degrade as the number of trees increases, and that's what Breiman meant by
`random forests do not overfit'.
Andy
>
> predicted
> observed lesion noninf
> lesion 4474 0
> noninf 0 4474
>
>
>
>
>
>
>
> ---------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list