[R] Random Forest: OOB performance = test set performance?

Sun Apr 11 05:48:46 CEST 2021

Hi ML,

For random forest, I thought that the out-of-bag performance should be 
the same (or at least very similar) to the performance calculated on a 
separated test set.

But this does not seem to be the case.

In the following code, the accuracy computed on out-of-bag sample is 
77.81%, while the one computed on a separated test set is 81%.

Can you please check what I am doing wrong?

Thanks in advance and best regards.

library(randomForest)
library(ISLR)

Carseats$High <- ifelse(Carseats$Sales<=8,"No","Yes")
Carseats$High <- as.factor(Carseats$High)

train = sample(1:nrow(Carseats), 200)

rf = randomForest(High~.-Sales,
                   data=Carseats,
                   subset=train,
                   mtry=6,
                   importance=T)

acc <- (rf$confusion[1,1] + rf$confusion[2,2]) / sum(rf$confusion)
print(paste0("Accuracy OOB: ", round(acc*100,2), "%"))

yhat <- predict(rf, newdata=Carseats[-train,])
y <- Carseats[-train,]$High
conftest <- table(y, yhat)
acctest <- (conftest[1,1] + conftest[2,2]) / sum(conftest)
print(paste0("Accuracy test set: ", round(acctest*100,2), "%"))