Gavin Simpson
gavin.simpson at ucl.ac.uk
Sun Apr 29 15:38:39 CEST 2007
No differences between runs for me on FC4 using R 2.4.1 and 2.5.0 with:
> require(randomForest)
Loading required package: randomForest
randomForest 4.5-18
*if* I reset the seed before each call to randomForest.
Your example code doesn't seem to be resetting the random seed before
each run. As such, each run is using a different set of random variables
at each bootstrap sample.
E.g. runs all same with reset seed:
> set.seed(12)
> randomForest(Species ~ ., data=iris)
Call:
randomForest(formula = Species ~ ., data = iris)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 3 47 0.06
> set.seed(12)
> randomForest(x=iris[,1:4],y=iris[,5])
Call:
randomForest(x = iris[, 1:4], y = iris[, 5])
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 3 47 0.06
> set.seed(12)
> randomForest(x=iris[,c(1:4)],y=iris[,5])
Call:
randomForest(x = iris[, c(1:4)], y = iris[, 5])
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 3 47 0.06
> set.seed(12)
> randomForest(x=iris[,c(1,2,3,4)],y=iris[,5])
Call:
randomForest(x = iris[, c(1, 2, 3, 4)], y = iris[, 5])
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 2
OOB estimate of error rate: 4%
Confusion matrix:
setosa versicolor virginica class.error
setosa 50 0 0 0.00
versicolor 0 47 3 0.06
virginica 0 3 47 0.06
