[R] How to measure/rank "variable importance" when using rpart?

Liaw, Andy andy_liaw at merck.com
Mon Jan 24 16:21:51 CET 2011

```Check out caret::varImp.rpart().  It's described in the original CART
book.

Andy

From: Tal Galili
>
> Hello all,
>
> When building a CART model (specifically classification tree)
> using rpart,
> it is sometimes interesting to know what is the importance of
> the various
> variables introduced to the model.
>
> Thus, my question is: *What common measures exists for
> ranking/measuring
> variable importance of participating variables in a CART
> model? And how can
> this be computed using R (for example, when using the rpart package)*
>
> For example, here is some dummy code, created so you might show your
> solutions on it. This example is structured so that it is clear that
> variable x1 and x2 are "important" while (in some sense) x1 is more
> important then x2 (since x1 should apply to more cases, thus make more
> influence on the structure of the data, then x2).
>
> set.seed(31431)
>
> n <- 400
>
> x1 <- rnorm(n)
>
> x2 <- rnorm(n)
>
> x3 <- rnorm(n)
>
> x4 <- rnorm(n)
>
> x5 <- rnorm(n)
>
> X <- data.frame(x1,x2,x3,x4,x5)
>
> y <- sample(letters[1:4], n, T)
>
> y <- ifelse(X[,2] < -1 , "b", y)
>
> y <- ifelse(X[,1] < 0 , "a", y)
>
> require(rpart)
>
> fit <- rpart(y~., X)
>
> plot(fit); text(fit)
>
> info.gain.rpart(fit) # your function - telling us on each variable how
> important it is
>
> (references are always welcomed)
>
>
> Thanks!
>
> Tal
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com |  972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il
> (Hebrew) |
> www.r-statistics.com (English)
> --------------------------------------------------------------
> --------------------------------
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help