[R] Random Forests Variable Importance Question

Paul Fisch fischp at gmail.com
Mon Apr 13 11:04:03 CEST 2009


I am trying to use the random forests package for classification in R.

The Variable Importance Measures listed are:

-mean raw importance score of variable x for class 0

-mean raw importance score of variable x for class 1

-MeanDecreaseAccuracy

-MeanDecreaseGini

Now I know what these "mean" as in I know their definitions. What I
want to know is how to use them.

What I am trying to figure out is what these values mean in only the
context of how accurate they are, what is a good value, what is a bad
value, what are the maximums and minimums, etc.

If a variable has a high MeanDecreaseAccuracy or MeanDecreaseGini does
that mean it is important or unimportant? Also any information on the
raw scores would be really helpful too. I want to know everything
there is to know about these numbers that is relevant to the
application of them.

I don't really want a technical explanation that uses words like
'error', 'summation', or 'permutated', but rather a simpler
explanation that didn't involve any discussion of how random forests
works(I have read all about that and didn't find it very helpful.)

Like if I wanted someone to explain to me how to use a radio, I
wouldn't expect the explanation to involve how a radio converts radio
waves into sound.

If anyone can help me out at all it would be really great.  I have
read many many lectures on random forests and other data mining
lectures but I have never found simple answers about how to read the
variable importance measures.

Thanks,
Paul Fisch




More information about the R-help mailing list