[R] random forest problem when calculating variable importance

Scott Gilpin sgilpin at gmail.com
Thu Oct 14 21:40:46 CEST 2004


Hi - 

When using the randomForest function for regression, I get different
results for mean-squared error of the predictions depending on whether
or not I specify to calculate variable importance.  There is an
example below.  I looked briefly at the source code, but couldn't find
anything that would indicate why calculating variable importance would
(or should) change predictions.

I'm using randomForest version 4.3-3 (the latest from CRAN), and tried
 R 1.9.0, 1.9.1 and 2.0.0 on Windows XP, and R 1.9.1 on solaris 8.

Thanks,
Scott Gilpin

library(randomForest)
set.seed(2863)
x<-matrix(runif(1000),ncol=10)
colnames(x)<-1:10
beta<-matrix(c(1,2,3,4,5,0,0,0,0,0),ncol=1)
y<-drop(x %*% beta + rnorm(100))
newx<-matrix(runif(1000),ncol=10)
newy<-drop(newx %*% beta + rnorm(100))

set.seed(2863)
rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=F)
print(rf.fit$test$mse[500])

set.seed(2863)
rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=T)
print(rf.fit$test$mse[500])




More information about the R-help mailing list