[R] confusion matrix in randomForest
Miklos Kiss
mzkiss at gmail.com
Sun Jul 20 04:46:37 CEST 2008
I have a question on the output generated by randomForest in classification
mode, specifically, the confusion matrix. The confusion matrix lists the
various classes and how the forest classified each one, plus the
classification error. Are these numbers essentially averages over all the
trees in the forest? If so, is there a way I can get the standard deviation
values out of the randomForest, or do I have to evaluate each tree
individually? By way of illustration, let me show the confusion matrix
using the iris data. The output below shows that the forest correctly
classified 47 versicolor irises, but this is the result for the entire
forest. I'd like to know if every tree will have 47 correctly classified
versicolor irises, but I don't think it will. Same for the class.error
value. Not every tree will have those exact same values, right?
But this raises another question. For this example, I used the entire data
set to generate the forest, and so I assume that the confusion matrix is
based on OOB data, so if I created a training set and evaluated trees
individually in the test set I could get averages and standard deviations on
the error rate.
Any thoughts? Thanks in advance.
-Miklos Z. Kiss
> print(iris.rf)
Call:
randomForest(formula = Species ~ ., data = iris, importance = TRUE,
keep.forest = TRUE)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 5.33%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 5 45 0.10
--
View this message in context: http://www.nabble.com/confusion-matrix-in-randomForest-tp18550873p18550873.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list