[R] random forest problem when calculating variable importanc e
Liaw, Andy
andy_liaw at merck.com
Thu Oct 14 22:28:02 CEST 2004
Are the results dramatically different?
The result would be expected to be somewhat different, as setting
importance=TRUE would make many calls to the random number generator (for
permuting OOB data in each variable), making all but the first tree in the
forest different than if importance=FALSE.
Cheers,
Andy
> From: Scott Gilpin
>
> Hi -
>
> When using the randomForest function for regression, I get different
> results for mean-squared error of the predictions depending on whether
> or not I specify to calculate variable importance. There is an
> example below. I looked briefly at the source code, but couldn't find
> anything that would indicate why calculating variable importance would
> (or should) change predictions.
>
> I'm using randomForest version 4.3-3 (the latest from CRAN), and tried
> R 1.9.0, 1.9.1 and 2.0.0 on Windows XP, and R 1.9.1 on solaris 8.
>
> Thanks,
> Scott Gilpin
>
> library(randomForest)
> set.seed(2863)
> x<-matrix(runif(1000),ncol=10)
> colnames(x)<-1:10
> beta<-matrix(c(1,2,3,4,5,0,0,0,0,0),ncol=1)
> y<-drop(x %*% beta + rnorm(100))
> newx<-matrix(runif(1000),ncol=10)
> newy<-drop(newx %*% beta + rnorm(100))
>
> set.seed(2863)
> rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=F)
> print(rf.fit$test$mse[500])
>
> set.seed(2863)
> rf.fit <- randomForest(x=x,y=y,xtest=newx,ytest=newy,importance=T)
> print(rf.fit$test$mse[500])
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list