[R] random forest problem when calculating variable importance
Scott Gilpin
sgilpin at gmail.com
Thu Oct 14 21:40:46 CEST 2004
Hi -
When using the randomForest function for regression, I get different
results for mean-squared error of the predictions depending on whether
or not I specify to calculate variable importance. There is an
example below. I looked briefly at the source code, but couldn't find
anything that would indicate why calculating variable importance would
(or should) change predictions.
I'm using randomForest version 4.3-3 (the latest from CRAN), and tried
R 1.9.0, 1.9.1 and 2.0.0 on Windows XP, and R 1.9.1 on solaris 8.
Thanks,
Scott Gilpin
library(randomForest)
set.seed(2863)
x<-matrix(runif(1000),ncol=10)
colnames(x)<-1:10
beta<-matrix(c(1,2,3,4,5,0,0,0,0,0),ncol=1)
y<-drop(x %*% beta + rnorm(100))
newx<-matrix(runif(1000),ncol=10)
newy<-drop(newx %*% beta + rnorm(100))
set.seed(2863)
rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=F)
print(rf.fit$test$mse[500])
set.seed(2863)
rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=T)
print(rf.fit$test$mse[500])
More information about the R-help
mailing list