[R] Random Forest: OOB performance = test set performance?
thebudget72 m@iii@g oii gm@ii@com
thebudget72 m@iii@g oii gm@ii@com
Sun Apr 11 05:48:46 CEST 2021
Hi ML,
For random forest, I thought that the out-of-bag performance should be
the same (or at least very similar) to the performance calculated on a
separated test set.
But this does not seem to be the case.
In the following code, the accuracy computed on out-of-bag sample is
77.81%, while the one computed on a separated test set is 81%.
Can you please check what I am doing wrong?
Thanks in advance and best regards.
library(randomForest)
library(ISLR)
Carseats$High <- ifelse(Carseats$Sales<=8,"No","Yes")
Carseats$High <- as.factor(Carseats$High)
train = sample(1:nrow(Carseats), 200)
rf = randomForest(High~.-Sales,
data=Carseats,
subset=train,
mtry=6,
importance=T)
acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion)
print(paste0("Accuracy OOB: ", round(acc*100,2), "%"))
yhat <- predict(rf, newdata=Carseats[-train,])
y <- Carseats[-train,]$High
conftest <- table(y, yhat)
acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest)
print(paste0("Accuracy test set: ", round(acctest*100,2), "%"))
More information about the R-help
mailing list